Predictive analytics: Obliterated
Thoughts
Questions
- What deficit are we talking about? Is it covered by the mRS?
- Should the pre-treatment mRS be in the model? Already covered by the others?
Backlog
- Honest estimation
- G estimation (using only significant ones to get accurate ORs)
- Make sure that the number of outcomes (26) is valid.
- Bootstrap the whole process - how many models include each of the features?
- The SuSiE method: https://www.biorxiv.org/content/10.1101/501114v1
- Choose best model with forward/backward exclusion
- Feature selection with honest estimation + p-value-based + bootstrap
- Feature selection with honest estimation + LASSO-based + bootstrap
- Implement PCA to create independent variables and not drop variables
- Bayesian fitting
- Change to use tidybayes and tidymodels
- Use neural networks to fit
- Consider a time-to-event analysis
Setup
Imports
# Load packages
library(glmnet)
library(magrittr)
library(selectiveInference)
library(tidyverse)
# Data
# https://office365stanford-my.sharepoint.com/:x:/r/personal/simonlev_stanford_edu/_layouts/15/Doc.aspx?sourcedoc=%7BCE7DAF19-8F1A-43B0-866D-3D0F36336EDC%7D&file=Peds%20AVM%20Database%20July%2010%202024%20Onwards.xlsx&fromShare=true&action=default&mobileredirect=true
# Source codeConfigurations
# Source outcome-specific configs
source(params$CONFIG)
# Destination locations
DST_DIRNAME <- paste0("../../outputs/")
# Source locations
SRC_DIRNAME <- "../../data/3_tidy/patients"
SRC_BASENAME <- "patients-daily_2024-07-21.csv"
# Should the Dockerfile be automatically updated?
UPDATE_DOCKERFILE <- FALSE
# Set the seed
set.seed(1891)Parameters
EXPOSURES_CONTINUOUS <- c(
"Age at 1st treatment (years)" = "age_at_first_treatment_yrs",
"mRS (presentation)" = "modified_rankin_score_presentation",
"mRS (pre-treatment)" = "modified_rankin_score_pretreatment",
"mRS (1-week post-op)" = "modified_rankin_score_postop_within_1_week",
"mRS (final)" = "modified_rankin_score_final",
"Nidus size (cm)" = "max_size_cm",
"Spetzler-Martin grade" = "spetzler_martin_grade",
"Lawton-Young grade" = "lawton_young_grade",
"Size score" = "size_score"
)
EXPOSURES_BINARY <- c(
"Sex (Male)" = "is_male",
"Involves eloquent location" = "is_eloquent_location",
"Has deep venous drainage" = "has_deep_venous_drainage",
"Diffuse nidus" = "is_diffuse_nidus",
"Hemorrhage at presentation" = "has_hemorrhage",
"Seizures at presentation" = "has_seizures",
"Deficit at presentation" = "has_deficit",
"Paresis at presentation" = "has_paresis",
"Presence of aneurysms" = "has_associated_aneurysm",
"Spetzler-Martin grade < 4" = "is_spetzler_martin_grade_less_than_4",
"Had surgery" = "is_surgery"
)
EXPOSURES_CATEGORICAL <- c(
"Location" = "location",
"Venous drainage" = "venous_drainage",
"Modality of treatment" = "procedure_combinations"
)
OUTCOMES <- c(
"Poor outcome (mRS >= 3)" = "is_poor_outcome",
"Obliteration" = "is_obliterated",
"Complications - minor" = "has_complication_minor",
"Complications - major" = "has_complication_major",
"mRS change (final - pre-treatment)" =
"modified_rankin_score_final_minus_pretx",
"mRS change (final - presentation)" =
"modified_rankin_score_final_minus_presentation",
"mRS change direction (Final - Pre-tx)" = "change_in_mrs_tx_vs_final"
)Functions
R utils.
Data analysis utils.
Read
# File path
filepath <- file.path(SRC_DIRNAME, SRC_BASENAME)
# Read
patients_ <-
read_csv(filepath) %>%
dplyr::sample_frac(params$PROPORTION_OF_DATA)
# If analyzing a subset of the data, restrict the dataset
if (params$TITLE == "Poor outcome - Hemorrhage") {
patients_ <- patients_ %>% filter(has_hemorrhage)
}
if (params$TITLE == "Poor outcome - No Hemorrhage") {
patients_ <- patients_ %>% filter(!has_hemorrhage)
}Conform
Setup
Remove all empty rows.
Create two dataframes - one for univariable and one for multivariable analysis. This is because variables for multivariable analysis will be recoded to reduce the number of levels to avoid overfitting the model.
Recode columns
For venous_drainage if “Both”, mark as “Deep.”
df_multi <-
df_multi %>%
mutate(
venous_drainage = ifelse(venous_drainage == "Both", "Deep", venous_drainage)
)For procedure_combinations, change > 1 to multimodal
to reduce levels.
df_multi <-
df_multi %>%
mutate(procedure_combinations = case_when(
nchar(procedure_combinations) > 1 ~ "Multimodal",
.default = procedure_combinations
))For procedure_combinations, indicate if surgery-based or
not.
df_multi <-
df_multi %>%
mutate(
is_surgery = str_detect(procedure_combinations, "S")
) %>%
relocate(is_surgery, .after = procedure_combinations)
df_uni <-
df_uni %>%
mutate(
is_surgery = str_detect(procedure_combinations, "S")
) %>%
relocate(is_surgery, .after = procedure_combinations)For location, reduce choices (use “Other”).
df_multi <-
df_multi %>%
mutate(
location = ifelse(location == "Corpus Callosum", "Other", location),
location = ifelse(location == "Cerebellum", "Other", location),
# location = ifelse(location == "Deep", "Other", location),
location = factor(location),
location = relevel(location, ref = "Other")
)For spetzler_martin_grade, create a binary variant of
1-3 vs. 4-5.
# For multivariable analysis
df_multi <-
df_multi %>%
mutate(
# Divides into low grade vs. high grade
is_spetzler_martin_grade_less_than_4 = spetzler_martin_grade < 4
) %>%
relocate(is_spetzler_martin_grade_less_than_4, .after = spetzler_martin_grade)
# For univariable analysis
df_uni <-
df_uni %>%
mutate(
is_spetzler_martin_grade_less_than_4 = spetzler_martin_grade < 4
) %>%
relocate(is_spetzler_martin_grade_less_than_4, .after = spetzler_martin_grade)Missingness
# Get cols
cols <- unname(c(
EXPOSURES_CONTINUOUS,
EXPOSURES_BINARY,
EXPOSURES_CATEGORICAL,
OUTCOMES
))
# Count missingness
df_multi %>%
select(cols) %>%
summarise_all(~ sum(is.na(.))) %>%
pivot_longer(everything(), values_to = "num_missing") %>%
arrange(desc(num_missing)) %>%
filter(num_missing > 0) %>%
sable()| name | num_missing |
|---|---|
| modified_rankin_score_pretreatment | 2 |
| modified_rankin_score_postop_within_1_week | 2 |
| lawton_young_grade | 2 |
| has_associated_aneurysm | 2 |
| is_surgery | 2 |
| procedure_combinations | 2 |
| modified_rankin_score_final_minus_pretx | 2 |
| is_male | 1 |
| is_diffuse_nidus | 1 |
| location | 1 |
| has_complication_minor | 1 |
| has_complication_major | 1 |
Which eligible patients are missing each variable?
df_multi %>%
filter(is_eligible) %>%
select(mrn, cols) %>%
mutate(across(-mrn, is.na)) %>%
pivot_longer(
cols = -mrn,
names_to = "name",
values_to = "is_missing"
) %>%
filter(is_missing) %>%
group_by(name) %>%
summarize(mrn = str_c(mrn, collapse = ", ")) %>%
sable()| name | mrn |
|---|---|
| has_associated_aneurysm | 60310463, 61011961 |
| has_complication_major | 62023254 |
| has_complication_minor | 62023254 |
| is_diffuse_nidus | 15871379 |
| is_male | 19399469 |
| is_surgery | 49659493, 46166047 |
| lawton_young_grade | 60310463, 61011961 |
| location | 60310463 |
| modified_rankin_score_final_minus_pretx | 49659493, 46166047 |
| modified_rankin_score_postop_within_1_week | 49659493, 46166047 |
| modified_rankin_score_pretreatment | 49659493, 46166047 |
| procedure_combinations | 49659493, 46166047 |
Visualizations
Overall
Distribution of values of the exposures across levels of the outcome.
# Define the predictors of interest
cols <- c(
EXPOSURES_CONTINUOUS,
EXPOSURES_BINARY,
CHOSEN_OUTCOME
)
# Reshape the data
df_long <-
df_uni %>%
select(all_of(cols)) %>%
pivot_longer(-CHOSEN_OUTCOME, names_to = "predictor", values_to = "value")
# Plot the box plots
df_long %>%
ggplot(aes_string(
x = CHOSEN_OUTCOME,
y = "value",
fill = CHOSEN_OUTCOME,
color = CHOSEN_OUTCOME
)) +
geom_boxplot(alpha = 0.5) +
facet_wrap(~predictor, scales = "free") +
labs(x = "Outcome", y = "Value", fill = "Outcome") +
theme_minimal() +
theme(
axis.text = element_text(size = 10),
axis.title = element_text(size = 11),
strip.text = element_text(size = 11),
legend.position = "none"
)By hemorrhage
# Define the predictors of interest
cols <- c(
EXPOSURES_CONTINUOUS,
EXPOSURES_BINARY,
CHOSEN_OUTCOME
)
# Reshape the data
df_long <-
df_uni %>%
select(all_of(cols)) %>%
pivot_longer(
cols = -c(CHOSEN_OUTCOME, `Hemorrhage at presentation`),
names_to = "predictor",
values_to = "value"
)
# Plot the box plots
df_long %>%
ggplot(aes_string(
x = CHOSEN_OUTCOME,
y = "value",
fill = "`Hemorrhage at presentation`",
color = "`Hemorrhage at presentation`"
)) +
geom_boxplot(alpha = 0.5) +
facet_wrap(~predictor, scales = "free") +
labs(x = "Outcome", y = "Value", fill = "Outcome") +
theme_minimal() +
guides(fill = "none") +
theme(
axis.text = element_text(size = 10),
axis.title = element_text(size = 11),
strip.text = element_text(size = 11)
)By SM grade high vs. low
# Define the predictors of interest
cols <- c(
EXPOSURES_CONTINUOUS,
EXPOSURES_BINARY,
CHOSEN_OUTCOME
)
# Reshape the data
df_long <-
df_uni %>%
select(all_of(cols)) %>%
pivot_longer(
cols = -c(CHOSEN_OUTCOME, `Spetzler-Martin grade < 4`),
names_to = "predictor",
values_to = "value"
)
# Plot the box plots
df_long %>%
ggplot(aes_string(
x = CHOSEN_OUTCOME,
y = "value",
fill = "`Spetzler-Martin grade < 4`",
color = "`Spetzler-Martin grade < 4`"
)) +
geom_boxplot(alpha = 0.5) +
facet_wrap(~predictor, scales = "free") +
labs(x = "Outcome", y = "Value", fill = "Outcome") +
theme_minimal() +
guides(fill = "none") +
theme(
axis.text = element_text(size = 10),
axis.title = element_text(size = 11),
strip.text = element_text(size = 11)
)Descriptive statistics
Setup
Descriptive statistics - continuous variables.
compute_descriptive_continuous <- function(
df = df_uni,
cols = c(EXPOSURES_CONTINUOUS),
use_interaction = TRUE,
interaction_col = "has_hemorrhage",
interaction_col_name = "Hemorrhage at presentation",
interaction_col_value = TRUE,
prefix = "Haem") {
# Apply desired formatting to numbers
format_numbers <- function(x) {
sprintf("%.1f", x)
}
# Compute new dataset
df <-
df %>%
select(cols, CHOSEN_OUTCOME, interaction_col) %>%
drop_na(CHOSEN_OUTCOME, interaction_col)
# Apply interaction
if (use_interaction) {
df <-
df %>%
filter(!!sym(interaction_col) == interaction_col_value)
}
# Compute descriptive statistics
desc_stats <-
df %>%
pivot_longer(where(is.numeric), names_to = "keys", values_to = "values") %>%
group_by(!!sym(CHOSEN_OUTCOME), keys) %>%
summarize(
num_total = n(),
num_missing = sum(is.na(values)),
mean = mean(values, na.rm = TRUE),
sd = sd(values, na.rm = TRUE),
min = quantile(values, 0, na.rm = TRUE),
max = quantile(values, 1, na.rm = TRUE),
q_50 = quantile(values, 0.50, na.rm = TRUE),
q_25 = quantile(values, 0.25, na.rm = TRUE),
q_75 = quantile(values, 0.75, na.rm = TRUE)
) %>%
ungroup() %>%
mutate(
stats = glue::glue("{format_numbers(mean)} ({format_numbers(sd)})"),
!!sym(CHOSEN_OUTCOME) := factor(
!!sym(CHOSEN_OUTCOME),
levels = CHOSEN_OUTCOME_LEVELS,
labels = CHOSEN_OUTCOME_LABELS
)
) %>%
pivot_wider(
id_cols = keys,
names_from = !!sym(CHOSEN_OUTCOME),
values_from = stats
) %>%
relocate(CHOSEN_OUTCOME_LABELS[2], .after = CHOSEN_OUTCOME_LABELS[1])
# Compute p-values
pvals <-
df %>%
pivot_longer(where(is.numeric), names_to = "keys", values_to = "values") %>%
group_by(keys) %>%
summarize(
pvalue = wilcox.test(values ~ !!sym(CHOSEN_OUTCOME))$p.value
)
# Synthesize
out <-
desc_stats %>%
left_join(pvals)
# Add sample size to column names
num_with_outcome <- sum(df %>% pull(CHOSEN_OUTCOME))
num_no_outcome <- sum(!df %>% pull(CHOSEN_OUTCOME))
new_col_names <- c(
glue::glue("{prefix} - {CHOSEN_OUTCOME_LABELS[2]} (n={num_with_outcome})"),
glue::glue("{prefix} - {CHOSEN_OUTCOME_LABELS[1]} (n={num_no_outcome})"),
glue::glue("{prefix} - P-value")
)
out <-
out %>%
rename_with(
.cols = !!sym(CHOSEN_OUTCOME_LABELS[1]):pvalue, ~new_col_names
) %>%
mutate(
keys = ifelse(keys == interaction_col, interaction_col_name, keys)
)
return(out)
}Descriptive statistics - binary variables.
compute_descriptive_binary <- function(
df = df_uni,
cols = c(EXPOSURES_BINARY),
use_interaction = TRUE,
interaction_col = "has_hemorrhage",
interaction_col_name = "Hemorrhage at presentation",
interaction_col_value = TRUE,
prefix = "Haem") {
# Define arguments
cols <- c(cols[!cols == interaction_col])
# Apply desired formatting to numbers
format_numbers <- function(x, decimals = 0) {
decimals <- paste0("%.", decimals, "f")
sprintf(decimals, x)
}
# Compute new dataset
df <-
df %>%
select(all_of(cols), CHOSEN_OUTCOME, interaction_col) %>%
drop_na(CHOSEN_OUTCOME, interaction_col)
# Apply interaction
if (use_interaction) {
df <-
df %>%
filter(!!sym(interaction_col) == interaction_col_value)
}
# Compute descriptive statistics
desc_stats <-
df %>%
pivot_longer(
cols = -c(CHOSEN_OUTCOME),
names_to = "keys",
values_to = "values"
) %>%
group_by(!!sym(CHOSEN_OUTCOME), keys) %>%
summarize(
num_total = n(),
num_missing = sum(is.na(values)),
num_with = sum(values, na.rm = TRUE),
num_without = sum(!values, na.rm = TRUE),
pct_with = mean(values, na.rm = TRUE)
) %>%
ungroup() %>%
mutate(
stats = glue::glue("{num_with} ({format_numbers(pct_with*100)}%)"),
!!sym(CHOSEN_OUTCOME) := factor(
!!sym(CHOSEN_OUTCOME),
levels = CHOSEN_OUTCOME_LEVELS,
labels = CHOSEN_OUTCOME_LABELS
)
) %>%
pivot_wider(
id_cols = keys,
names_from = !!sym(CHOSEN_OUTCOME),
values_from = stats
) %>%
relocate(CHOSEN_OUTCOME_LABELS[1], .after = CHOSEN_OUTCOME_LABELS[2])
# Compute p-values
pvals <-
df %>%
pivot_longer(
cols = -c(CHOSEN_OUTCOME),
names_to = "keys",
values_to = "values"
) %>%
group_by(keys) %>%
summarize(
pvalue = kruskal.test(values ~ !!sym(CHOSEN_OUTCOME))$p.value
)
# Synthesize
out <-
desc_stats %>%
left_join(pvals)
# Prettify
num_with_outcome <- sum(df %>% pull(CHOSEN_OUTCOME))
num_no_outcome <- sum(!df %>% pull(CHOSEN_OUTCOME))
new_col_names <- c(
glue::glue("{prefix} - {CHOSEN_OUTCOME_LABELS[2]} (n={num_with_outcome})"),
glue::glue("{prefix} - {CHOSEN_OUTCOME_LABELS[1]} (n={num_no_outcome})"),
glue::glue("{prefix} - P-value")
)
out <-
out %>%
rename_with(
.cols = !!sym(CHOSEN_OUTCOME_LABELS[2]):pvalue, ~new_col_names
) %>%
mutate(
keys = ifelse(keys == interaction_col, interaction_col_name, keys)
)
return(out)
}Non-specific
# Rename dataframe
`Pediatric AVMs` <-
df_uni %>%
filter(is_eligible) %>%
select(-is_eligible, -comments)
# Create summary
`Pediatric AVMs` %>%
summarytools::dfSummary(display.labels = FALSE) %>%
print(
file =
"../../outputs/descriptive-statistics/descriptive_statistics_eligible.html",
footnote = NA
)
# Remove unwanted dataframe
remove(`Pediatric AVMs`)By hemorrhage - table1 (WIP)
https://cran.r-project.org/web/packages/table1/vignettes/table1-examples.html
library(table1)
table1::table1(
~ factor(is_male) + age_at_first_treatment_yrs + factor(is_eloquent_location) + max_size_cm | has_hemorrhage,
data = df_uni
)# Initialize values
df <- df_uni
# Remove missing values in stratification variables
df <-
df %>%
drop_na(has_hemorrhage, CHOSEN_OUTCOME)
# Define columns
cols <- c(
EXPOSURES_CONTINUOUS,
EXPOSURES_BINARY,
OUTCOMES[OUTCOMES == CHOSEN_OUTCOME]
)
# Assign labels to the variables in the dataframe
for (var in cols) {
label(df[[var]]) <- names(cols[cols == var])
}
# Assign meaningful labels to the has_hemorrhage variable
df$has_hemorrhage <- factor(
df$has_hemorrhage,
levels = c(FALSE, TRUE),
labels = c("No Hemorrhage", "Has Hemorrhage")
)
df[, CHOSEN_OUTCOME] <- factor(
df %>% pull(CHOSEN_OUTCOME),
levels = CHOSEN_OUTCOME_LEVELS,
labels = CHOSEN_OUTCOME_LABELS
)
# Define custom rendering functions if needed
render_continuous <- function(x) {
with(stats.apply.rounding(stats.default(x), digits = 2), c(
"Mean (SD)" = sprintf("%s (± %s)", MEAN, SD),
"Median (IQR)" = sprintf("%s (%s - %s)", MEDIAN, Q1, Q3)
))
}
compute_pvalues <- function(x, ...) {
# Construct vectors of data y, and groups (strata) g
y <- unlist(x)
g <- factor(rep(1:length(x), times = sapply(x, length)))
if (is.numeric(y)) {
# For numeric variables, perform a standard 2-sample t-test
p <- t.test(y ~ g)$p.value
} else {
# For categorical variables, perform a chi-squared test of independence
p <- chisq.test(table(y, g))$p.value
}
# Format the p-value, using an HTML entity for the less-than sign.
# The initial empty string places the output on the line below the variable label.
c("", sub("<", "<", format.pval(p, digits = 3, eps = 0.001)))
}
# Create the descriptive table
frla <- as.formula(paste("~ . | has_hemorrhage *", CHOSEN_OUTCOME))
# Create table
table1_descriptive_statistics <-
table1(frla,
data = df[, unname(cols)],
render.continuous = render_continuous,
overall = c(left = "Overall"),
topclass = "Rtable1-zebra"
)
# Print
table1_descriptive_statisticsBy hemorrhage
if (!str_detect(params$TITLE, "Hemorrhage")) {
# Define arguments
interaction_col <- "has_hemorrhage"
interaction_col_name <- "Hemorrhage at presentation"
prefix_true <- "Haem"
prefix_false <- "No Haem"
# Get all
out <- list()
out[[1]] <- compute_descriptive_continuous(
use_interaction = F,
interaction_col = interaction_col,
interaction_col_name = interaction_col_name,
interaction_col_value = T,
prefix = "All"
)
out[[2]] <- compute_descriptive_binary(
use_interaction = F,
interaction_col = interaction_col,
interaction_col_name = interaction_col_name,
interaction_col_value = T,
prefix = "All"
)
out_all <- bind_rows(out)
# Get with hemorrhage
out <- list()
out[[1]] <- compute_descriptive_continuous(
use_interaction = T,
interaction_col = interaction_col,
interaction_col_name = interaction_col_name,
interaction_col_value = T,
prefix = prefix_true
)
out[[2]] <- compute_descriptive_binary(
use_interaction = T,
interaction_col = interaction_col,
interaction_col_name = interaction_col_name,
interaction_col_value = T,
prefix = prefix_true
)
out_haem <- bind_rows(out)
# Get without hemorrhage
out <- list()
out[[1]] <- compute_descriptive_continuous(
use_interaction = T,
interaction_col = interaction_col,
interaction_col_name = interaction_col_name,
interaction_col_value = F,
prefix = prefix_false
)
out[[2]] <- compute_descriptive_binary(
use_interaction = T,
interaction_col = interaction_col,
interaction_col_name = interaction_col_name,
interaction_col_value = F,
prefix = prefix_false
)
out_no_haem <- bind_rows(out)
# Synthesize
out_complete <-
out_all %>%
left_join(out_haem, by = "keys") %>%
left_join(out_no_haem, by = "keys") %>%
arrange(`All - P-value`)
# Print
out_complete %>%
sable()
}| keys | All - Obliterated (n=178) | All - Not obliterated (n=50) | All - P-value | Haem - Obliterated (n=117) | Haem - Not obliterated (n=20) | Haem - P-value | No Haem - Obliterated (n=61) | No Haem - Not obliterated (n=30) | No Haem - P-value |
|---|---|---|---|---|---|---|---|---|---|
| Had surgery | 123 (70%) | 10 (20%) | 0.00000 | 87 (76%) | 3 (15%) | 0.00000 | 36 (59%) | 7 (23%) | 0.00143 |
| Lawton-Young grade | 5.7 (1.5) | 4.0 (1.4) | 0.00000 | 5.2 (1.4) | 3.7 (1.3) | 0.00009 | 6.1 (1.4) | 4.7 (1.4) | 0.00004 |
| Spetzler-Martin grade | 3.8 (1.1) | 2.6 (1.2) | 0.00000 | 3.6 (1.0) | 2.5 (1.1) | 0.00020 | 3.9 (1.1) | 2.8 (1.2) | 0.00006 |
| Nidus size (cm) | 5.1 (3.0) | 3.0 (1.8) | 0.00000 | 4.3 (2.9) | 2.8 (1.8) | 0.02157 | 5.6 (3.0) | 3.4 (1.8) | 0.00006 |
| Spetzler-Martin grade < 4 | 138 (78%) | 21 (42%) | 0.00000 | 95 (81%) | 11 (55%) | 0.00994 | 43 (70%) | 10 (33%) | 0.00078 |
| Size score | 2.1 (0.8) | 1.6 (0.7) | 0.00000 | 1.8 (0.8) | 1.5 (0.6) | 0.06775 | 2.4 (0.7) | 1.7 (0.7) | 0.00018 |
| Involves eloquent location | 95 (53%) | 44 (88%) | 0.00001 | 56 (48%) | 20 (100%) | 0.00002 | 39 (64%) | 24 (80%) | 0.12057 |
| Diffuse nidus | 15 (8%) | 16 (32%) | 0.00002 | 10 (9%) | 6 (30%) | 0.00595 | 5 (8%) | 10 (33%) | 0.00285 |
| mRS (final) | 1.8 (1.8) | 0.8 (0.9) | 0.00012 | 2.4 (1.9) | 0.8 (0.9) | 0.00019 | 1.5 (1.7) | 0.7 (0.9) | 0.02199 |
| Hemorrhage at presentation | 117 (66%) | 20 (40%) | 0.00105 | 117 (100%) | 20 (100%) | 0 (0%) | 0 (0%) | ||
| Has deep venous drainage | 90 (51%) | 35 (70%) | 0.01489 | 67 (57%) | 15 (75%) | 0.13628 | 23 (38%) | 20 (67%) | 0.00968 |
| Deficit at presentation | 59 (33%) | 26 (52%) | 0.01507 | 40 (34%) | 11 (55%) | 0.07626 | 19 (31%) | 15 (50%) | 0.08222 |
| mRS (presentation) | 1.8 (1.6) | 2.4 (1.8) | 0.01602 | 2.9 (1.8) | 3.2 (1.7) | 0.47279 | 1.1 (1.0) | 1.1 (1.0) | 0.95248 |
| Presence of aneurysms | 33 (19%) | 13 (27%) | 0.19296 | 27 (23%) | 7 (35%) | 0.25571 | 6 (10%) | 6 (21%) | 0.13927 |
| Paresis at presentation | 43 (24%) | 15 (30%) | 0.40296 | 31 (26%) | 9 (45%) | 0.09377 | 12 (20%) | 6 (20%) | 0.97072 |
| mRS (pre-treatment) | 1.5 (1.4) | 1.6 (1.5) | 0.49277 | 2.2 (1.6) | 2.0 (1.6) | 0.59646 | 1.0 (0.9) | 1.0 (0.9) | 0.84416 |
| Sex (Male) | 92 (52%) | 28 (57%) | 0.49892 | 63 (54%) | 10 (50%) | 0.75091 | 29 (48%) | 18 (62%) | 0.19975 |
| Seizures at presentation | 42 (24%) | 14 (28%) | 0.52355 | 22 (19%) | 7 (35%) | 0.10255 | 20 (33%) | 7 (23%) | 0.35604 |
| Age at 1st treatment (years) | 11.2 (3.7) | 11.3 (3.9) | 0.80199 | 11.3 (3.1) | 10.9 (3.6) | 0.65741 | 11.1 (4.1) | 12.1 (4.3) | 0.21395 |
| mRS (1-week post-op) | 1.8 (1.5) | 1.6 (1.3) | 0.82396 | 2.4 (1.6) | 1.9 (1.3) | 0.21666 | 1.4 (1.3) | 1.2 (1.1) | 0.75637 |
By diffuse nidus
# Define arguments
interaction_col <- "is_diffuse_nidus"
interaction_col_name <- "Is diffuse nidus"
prefix_true <- "Diffuse"
prefix_false <- "Not Diffuse"
# Get all
out <- list()
out[[1]] <- compute_descriptive_continuous(
use_interaction = F,
interaction_col = interaction_col,
interaction_col_name = interaction_col_name,
interaction_col_value = T,
prefix = "All"
)
out[[2]] <- compute_descriptive_binary(
use_interaction = F,
interaction_col = interaction_col,
interaction_col_name = interaction_col_name,
interaction_col_value = T,
prefix = "All"
)
out_all <- bind_rows(out)
# Get with hemorrhage
out <- list()
out[[1]] <- compute_descriptive_continuous(
use_interaction = T,
interaction_col = interaction_col,
interaction_col_name = interaction_col_name,
interaction_col_value = T,
prefix = prefix_true
)
out[[2]] <- compute_descriptive_binary(
use_interaction = T,
interaction_col = interaction_col,
interaction_col_name = interaction_col_name,
interaction_col_value = T,
prefix = prefix_true
)
out_haem <- bind_rows(out)
# Get without hemorrhage
out <- list()
out[[1]] <- compute_descriptive_continuous(
use_interaction = T,
interaction_col = interaction_col,
interaction_col_name = interaction_col_name,
interaction_col_value = F,
prefix = prefix_false
)
out[[2]] <- compute_descriptive_binary(
use_interaction = T,
interaction_col = interaction_col,
interaction_col_name = interaction_col_name,
interaction_col_value = F,
prefix = prefix_false
)
out_no_haem <- bind_rows(out)
# Synthesize
out_complete <-
out_all %>%
left_join(out_haem, by = "keys") %>%
left_join(out_no_haem, by = "keys") %>%
arrange(`All - P-value`)
# Print
out_complete %>%
sable()| keys | All - Obliterated (n=177) | All - Not obliterated (n=50) | All - P-value | Diffuse - Obliterated (n=15) | Diffuse - Not obliterated (n=16) | Diffuse - P-value | Not Diffuse - Obliterated (n=162) | Not Diffuse - Not obliterated (n=34) | Not Diffuse - P-value |
|---|---|---|---|---|---|---|---|---|---|
| Had surgery | 123 (70%) | 10 (20%) | 0.00000 | 11 (73%) | 4 (25%) | 0.00811 | 112 (70%) | 6 (18%) | 0.00000 |
| Lawton-Young grade | 5.7 (1.5) | 4.0 (1.4) | 0.00000 | 7.1 (1.0) | 6.3 (1.2) | 0.09077 | 5.1 (1.2) | 3.8 (1.2) | 0.00000 |
| Spetzler-Martin grade | 3.8 (1.1) | 2.6 (1.2) | 0.00000 | 4.6 (0.6) | 4.1 (1.0) | 0.13386 | 3.4 (1.0) | 2.5 (1.1) | 0.00002 |
| Nidus size (cm) | 5.1 (3.0) | 3.0 (1.8) | 0.00000 | 7.4 (3.4) | 5.3 (2.2) | 0.09128 | 4.0 (2.0) | 2.8 (1.7) | 0.00059 |
| Spetzler-Martin grade < 4 | 137 (77%) | 21 (42%) | 0.00000 | 4 (27%) | 1 (6%) | 0.12866 | 133 (82%) | 20 (59%) | 0.00294 |
| Size score | 2.1 (0.8) | 1.6 (0.7) | 0.00000 | 2.6 (0.6) | 2.3 (0.7) | 0.22405 | 1.9 (0.8) | 1.5 (0.6) | 0.00127 |
| Involves eloquent location | 94 (53%) | 44 (88%) | 0.00001 | 13 (87%) | 16 (100%) | 0.13739 | 81 (50%) | 28 (82%) | 0.00057 |
| Is diffuse nidus | 15 (8%) | 16 (32%) | 0.00002 | 15 (100%) | 16 (100%) | 0 (0%) | 0 (0%) | ||
| mRS (final) | 1.8 (1.8) | 0.8 (0.9) | 0.00014 | 2.6 (1.8) | 1.0 (0.9) | 0.00762 | 1.5 (1.7) | 0.8 (0.9) | 0.05093 |
| Hemorrhage at presentation | 117 (66%) | 20 (40%) | 0.00089 | 10 (67%) | 6 (38%) | 0.11015 | 107 (66%) | 14 (41%) | 0.00681 |
| Deficit at presentation | 58 (33%) | 26 (52%) | 0.01308 | 7 (47%) | 11 (69%) | 0.22059 | 51 (31%) | 15 (44%) | 0.15742 |
| Has deep venous drainage | 90 (51%) | 35 (70%) | 0.01645 | 13 (87%) | 14 (88%) | 0.94575 | 77 (48%) | 21 (62%) | 0.13226 |
| mRS (presentation) | 1.8 (1.6) | 2.5 (1.8) | 0.01646 | 2.0 (1.6) | 2.0 (1.6) | 0.98320 | 1.7 (1.7) | 2.5 (1.8) | 0.01689 |
| Presence of aneurysms | 32 (18%) | 13 (27%) | 0.16753 | 4 (27%) | 6 (38%) | 0.52586 | 28 (17%) | 7 (22%) | 0.53817 |
| Paresis at presentation | 42 (24%) | 15 (30%) | 0.36760 | 5 (33%) | 7 (44%) | 0.55830 | 37 (23%) | 8 (24%) | 0.93088 |
| mRS (pre-treatment) | 1.5 (1.4) | 1.6 (1.5) | 0.51007 | 1.9 (1.6) | 1.7 (1.3) | 0.96677 | 1.3 (1.3) | 1.6 (1.5) | 0.27834 |
| Sex (Male) | 92 (52%) | 28 (57%) | 0.52231 | 6 (40%) | 10 (62%) | 0.21781 | 86 (53%) | 18 (55%) | 0.87861 |
| Seizures at presentation | 42 (24%) | 14 (28%) | 0.53705 | 6 (40%) | 5 (31%) | 0.61668 | 36 (22%) | 9 (26%) | 0.59326 |
| Age at 1st treatment (years) | 11.2 (3.7) | 11.3 (3.9) | 0.76249 | 10.2 (3.6) | 10.9 (3.9) | 0.63239 | 11.7 (3.7) | 11.4 (3.9) | 0.64510 |
| mRS (1-week post-op) | 1.8 (1.5) | 1.6 (1.3) | 0.80914 | 2.2 (1.6) | 1.6 (1.1) | 0.31152 | 1.5 (1.4) | 1.6 (1.3) | 0.52785 |
By SM grade high vs. low
# Define arguments
interaction_col <- "is_spetzler_martin_grade_less_than_4"
interaction_col_name <- "Spetzler-Martin grade < 4"
prefix_true <- "Low grade"
prefix_false <- "High grade"
# Get all
out <- list()
out[[1]] <- compute_descriptive_continuous(
use_interaction = F,
interaction_col = interaction_col,
interaction_col_name = interaction_col_name,
interaction_col_value = T,
prefix = "All"
)
out[[2]] <- compute_descriptive_binary(
use_interaction = F,
interaction_col = interaction_col,
interaction_col_name = interaction_col_name,
interaction_col_value = T,
prefix = "All"
)
out_all <- bind_rows(out)
# Get with hemorrhage
out <- list()
out[[1]] <- compute_descriptive_continuous(
use_interaction = T,
interaction_col = interaction_col,
interaction_col_name = interaction_col_name,
interaction_col_value = T,
prefix = prefix_true
)
out[[2]] <- compute_descriptive_binary(
use_interaction = T,
interaction_col = interaction_col,
interaction_col_name = interaction_col_name,
interaction_col_value = T,
prefix = prefix_true
)
out_haem <- bind_rows(out)
# Get without hemorrhage
out <- list()
out[[1]] <- compute_descriptive_continuous(
use_interaction = T,
interaction_col = interaction_col,
interaction_col_name = interaction_col_name,
interaction_col_value = F,
prefix = prefix_false
)
out[[2]] <- compute_descriptive_binary(
use_interaction = T,
interaction_col = interaction_col,
interaction_col_name = interaction_col_name,
interaction_col_value = F,
prefix = prefix_false
)
out_no_haem <- bind_rows(out)
# Synthesize
out_complete <-
out_all %>%
left_join(out_haem, by = "keys") %>%
left_join(out_no_haem, by = "keys") %>%
arrange(`All - P-value`)
# Print
out_complete %>%
sable()| keys | All - Obliterated (n=178) | All - Not obliterated (n=50) | All - P-value | Low grade - Obliterated (n=138) | Low grade - Not obliterated (n=21) | Low grade - P-value | High grade - Obliterated (n=40) | High grade - Not obliterated (n=29) | High grade - P-value |
|---|---|---|---|---|---|---|---|---|---|
| Had surgery | 123 (70%) | 10 (20%) | 0.00000 | 99 (73%) | 4 (19%) | 0.00000 | 24 (60%) | 6 (21%) | 0.00125 |
| Lawton-Young grade | 5.7 (1.5) | 4.0 (1.4) | 0.00000 | 4.3 (0.7) | 3.5 (1.0) | 0.00130 | 6.7 (0.9) | 5.9 (0.9) | 0.00190 |
| Spetzler-Martin grade | 3.8 (1.1) | 2.6 (1.2) | 0.00000 | 2.7 (0.6) | 2.1 (0.8) | 0.00148 | 4.6 (0.5) | 4.2 (0.4) | 0.00579 |
| Nidus size (cm) | 5.1 (3.0) | 3.0 (1.8) | 0.00000 | 3.1 (1.8) | 2.4 (1.1) | 0.04734 | 6.5 (2.9) | 5.2 (2.1) | 0.02477 |
| Spetzler-Martin grade < 4 | 138 (78%) | 21 (42%) | 0.00000 | 138 (100%) | 21 (100%) | 0 (0%) | 0 (0%) | ||
| Size score | 2.1 (0.8) | 1.6 (0.7) | 0.00000 | 1.5 (0.6) | 1.3 (0.5) | 0.18362 | 2.6 (0.5) | 2.4 (0.5) | 0.11269 |
| Involves eloquent location | 95 (53%) | 44 (88%) | 0.00001 | 58 (42%) | 15 (71%) | 0.01205 | 37 (92%) | 29 (100%) | 0.13440 |
| Diffuse nidus | 15 (8%) | 16 (32%) | 0.00002 | 4 (3%) | 1 (5%) | 0.65442 | 11 (28%) | 15 (52%) | 0.04188 |
| mRS (final) | 1.8 (1.8) | 0.8 (0.9) | 0.00012 | 1.5 (1.7) | 0.7 (0.8) | 0.05258 | 2.1 (1.9) | 1.1 (1.1) | 0.02674 |
| Hemorrhage at presentation | 117 (66%) | 20 (40%) | 0.00105 | 95 (69%) | 11 (52%) | 0.13729 | 22 (55%) | 9 (31%) | 0.04987 |
| Has deep venous drainage | 90 (51%) | 35 (70%) | 0.01489 | 58 (42%) | 10 (48%) | 0.63062 | 32 (80%) | 25 (86%) | 0.50506 |
| Deficit at presentation | 59 (33%) | 26 (52%) | 0.01507 | 45 (33%) | 7 (33%) | 0.94759 | 14 (35%) | 19 (66%) | 0.01289 |
| mRS (presentation) | 1.8 (1.6) | 2.4 (1.8) | 0.01602 | 1.8 (1.8) | 2.6 (1.9) | 0.05172 | 1.8 (1.6) | 2.0 (1.5) | 0.38747 |
| Presence of aneurysms | 33 (19%) | 13 (27%) | 0.19296 | 23 (17%) | 6 (32%) | 0.11749 | 10 (25%) | 7 (24%) | 0.93510 |
| Paresis at presentation | 43 (24%) | 15 (30%) | 0.40296 | 33 (24%) | 5 (24%) | 0.99176 | 10 (25%) | 10 (34%) | 0.39490 |
| mRS (pre-treatment) | 1.5 (1.4) | 1.6 (1.5) | 0.49277 | 1.4 (1.5) | 1.6 (1.5) | 0.38951 | 1.6 (1.3) | 1.6 (1.2) | 0.64857 |
| Sex (Male) | 92 (52%) | 28 (57%) | 0.49892 | 74 (54%) | 9 (45%) | 0.47186 | 18 (45%) | 19 (66%) | 0.09400 |
| Seizures at presentation | 42 (24%) | 14 (28%) | 0.52355 | 31 (22%) | 5 (24%) | 0.89115 | 11 (28%) | 9 (31%) | 0.75117 |
| Age at 1st treatment (years) | 11.2 (3.7) | 11.3 (3.9) | 0.80199 | 12.4 (2.7) | 11.3 (3.7) | 0.24215 | 10.3 (4.1) | 11.4 (4.4) | 0.29417 |
| mRS (1-week post-op) | 1.8 (1.5) | 1.6 (1.3) | 0.82396 | 1.5 (1.5) | 1.6 (1.3) | 0.66679 | 1.9 (1.5) | 1.8 (1.2) | 0.98990 |
Associations
Correlation matrix.
# Cols
cols <- c(
EXPOSURES_CONTINUOUS,
EXPOSURES_BINARY,
EXPOSURES_CATEGORICAL,
OUTCOMES
)
# Remove hemorrhage
if (str_detect(params$TITLE, "Hemorrhage")) {
cols <- cols[!cols == "has_hemorrhage"]
}
# Create new dataframe
df_ <-
df_uni %>%
select(all_of(cols))
# Convert the outcome variable to numeric for correlation calculation
df_ <-
df_ %>%
select(where(is.numeric), where(is.logical)) %>%
mutate(across(everything(), as.numeric))
# Compute the correlation matrix
cor_matrix <- cor(df_, use = "complete.obs", method = "spearman")
# Plot the correlation matrix
ggcorrplot::ggcorrplot(
cor_matrix,
method = "circle",
lab = TRUE,
lab_size = 2,
colors = c("red", "white", "green4"),
title = "Correlation Matrix",
hc.order = TRUE,
) +
theme(
axis.text.x = element_text(size = 8), # Adjust x-axis text size
axis.text.y = element_text(size = 8) # Adjust y-axis text size
)Univariable statistics
Setup
Define function.
fit_model <- function(
df = df_uni, cols = cols, is_sandwich = T) {
# Initialize
out <- list()
for (i in seq_along(cols)) {
# Create formula
model <- as.formula(paste(CHOSEN_OUTCOME, "~", cols[i]))
fit <- suppressWarnings(glm(
model,
data = df,
family = binomial()
))
if (is_sandwich) {
# Calculate robust standard errors (sandwich)
robust_se <- sandwich::vcovHC(fit, type = "HC0")
# Compute robust confidence intervals
fit_results <- lmtest::coeftest(fit, vcov. = robust_se)
} else {
fit_results <- fit
}
# Summarize model coefficients
fit_summary <-
fit_results %>%
# broom does not support exponentiation after coeftest so do it manually
broom::tidy(conf.int = T) %>%
mutate(across(c(estimate, "conf.low", "conf.high"), exp)) %>%
arrange(term == "(Intercept)", p.value) %>%
rename(odds_ratio = estimate, z_value = statistic)
# Stylize and print
stylized <-
fit_summary %>%
rename(
"Predictors" = term,
"Odds Ratios (OR)" = odds_ratio,
"SE" = std.error,
"Z-scores" = z_value,
"P-values" = p.value,
"CI (low)" = conf.low,
"CI (high)" = conf.high,
) %>%
mutate(
`Odds Ratios (OR)` = round(`Odds Ratios (OR)`, 2),
`SE` = round(`SE`, 2),
`CI (low)` = round(`CI (low)`, 2),
`CI (high)` = round(`CI (high)`, 2),
`P-values` = round(`P-values`, 3),
)
out[[i]] <- stylized
}
return(out)
}Unadjusted - Original
Fit logistic.
# Define cols
cols <- unname(c(
EXPOSURES_CONTINUOUS,
EXPOSURES_BINARY,
EXPOSURES_CATEGORICAL
))
# Fit model
out <- fit_model(df = df_uni, cols = cols, is_sandwich = T)
# Create table
univariable_unadjusted <-
out %>%
bind_rows() %>%
filter(Predictors != "(Intercept)") %>%
filter(`CI (high)` < 50) %>%
arrange(`P-values`)
# Print
univariable_unadjusted %>%
sable()| Predictors | Odds Ratios (OR) | SE | Z-scores | P-values | CI (low) | CI (high) |
|---|---|---|---|---|---|---|
| modified_rankin_score_final | 0.54 | 0.12 | -4.98381 | 0.000 | 0.42 | 0.69 |
| max_size_cm | 0.69 | 0.08 | -4.63358 | 0.000 | 0.58 | 0.80 |
| spetzler_martin_grade | 0.40 | 0.16 | -5.74497 | 0.000 | 0.29 | 0.55 |
| lawton_young_grade | 0.47 | 0.12 | -6.18704 | 0.000 | 0.37 | 0.60 |
| size_score | 0.33 | 0.23 | -4.66773 | 0.000 | 0.21 | 0.53 |
| is_eloquent_locationTRUE | 0.16 | 0.46 | -4.03430 | 0.000 | 0.06 | 0.38 |
| is_diffuse_nidusTRUE | 0.20 | 0.41 | -4.00540 | 0.000 | 0.09 | 0.44 |
| is_spetzler_martin_grade_less_than_4TRUE | 4.76 | 0.34 | 4.61668 | 0.000 | 2.46 | 9.24 |
| is_surgeryTRUE | 9.28 | 0.39 | 5.71522 | 0.000 | 4.32 | 19.93 |
| venous_drainageSuperficial | 4.39 | 0.42 | 3.48242 | 0.000 | 1.91 | 10.08 |
| has_hemorrhageTRUE | 2.88 | 0.33 | 3.21157 | 0.001 | 1.51 | 5.48 |
| locationDeep | 0.07 | 1.05 | -2.52681 | 0.012 | 0.01 | 0.55 |
| has_deep_venous_drainageTRUE | 0.44 | 0.34 | -2.40409 | 0.016 | 0.22 | 0.86 |
| has_deficitTRUE | 0.46 | 0.32 | -2.40666 | 0.016 | 0.24 | 0.86 |
| venous_drainageDeep | 2.71 | 0.42 | 2.39697 | 0.017 | 1.20 | 6.12 |
| modified_rankin_score_presentation | 1.25 | 0.10 | 2.22691 | 0.026 | 1.03 | 1.52 |
| locationOther | 0.05 | 1.75 | -1.74397 | 0.081 | 0.00 | 1.46 |
| locationHemispheric | 0.25 | 1.05 | -1.32999 | 0.184 | 0.03 | 1.94 |
| procedure_combinationsERS | 3.75 | 1.00 | 1.31628 | 0.188 | 0.52 | 26.84 |
| has_associated_aneurysmTRUE | 0.61 | 0.38 | -1.29670 | 0.195 | 0.29 | 1.28 |
| has_paresisTRUE | 0.74 | 0.35 | -0.83640 | 0.403 | 0.37 | 1.49 |
| procedure_combinationsR | 2.17 | 0.96 | 0.80291 | 0.422 | 0.33 | 14.31 |
| procedure_combinationsER | 1.97 | 0.96 | 0.70656 | 0.480 | 0.30 | 13.01 |
| is_maleTRUE | 0.80 | 0.33 | -0.67700 | 0.498 | 0.42 | 1.52 |
| modified_rankin_score_pretreatment | 1.08 | 0.12 | 0.67581 | 0.499 | 0.86 | 1.36 |
| has_seizuresTRUE | 0.79 | 0.36 | -0.63845 | 0.523 | 0.39 | 1.61 |
| modified_rankin_score_postop_within_1_week | 0.93 | 0.13 | -0.54462 | 0.586 | 0.73 | 1.19 |
| age_at_first_treatment_yrs | 1.01 | 0.04 | 0.18340 | 0.854 | 0.93 | 1.09 |
Adjusted - Original
Fit logistic.
cols <- unname(c(
EXPOSURES_CONTINUOUS,
EXPOSURES_BINARY,
EXPOSURES_CATEGORICAL
))
# Initialize
out <- list()
df <- df_uni
for (i in seq_along(cols)) {
# Create formula
model <- as.formula(paste(
CHOSEN_OUTCOME,
"~",
"age_at_first_treatment_yrs + is_male +",
cols[i]
))
fit <- suppressWarnings(glm(
model,
data = df,
family = binomial()
))
# Calculate robust standard errors (sandwich)
robust_se <- sandwich::vcovHC(fit, type = "HC0")
# Compute robust confidence intervals
robust_ci <- lmtest::coeftest(fit, vcov. = robust_se)
# Summarize model coefficients
fit_summary <-
robust_ci %>%
# broom does not support exponentiation after coeftest so do it manually
broom::tidy(conf.int = T) %>%
mutate(across(c(estimate, "conf.low", "conf.high"), exp)) %>%
arrange(term == "(Intercept)", p.value) %>%
rename(odds_ratio = estimate, z_value = statistic)
# Stylize and print
stylized <-
fit_summary %>%
rename(
"Predictors" = term,
"Odds Ratios (OR)" = odds_ratio,
"SE" = std.error,
"Z-scores" = z_value,
"P-values" = p.value,
"CI (low)" = conf.low,
"CI (high)" = conf.high,
) %>%
mutate(
`Odds Ratios (OR)` = round(`Odds Ratios (OR)`, 2),
`SE` = round(`SE`, 2),
`CI (low)` = round(`CI (low)`, 2),
`CI (high)` = round(`CI (high)`, 2),
`P-values` = round(`P-values`, 3),
)
out[[i]] <- stylized
}Print results.
# Create table
univariable_adjusted <-
out %>%
bind_rows() %>%
filter(Predictors != "(Intercept)") %>%
filter(!str_starts(Predictors, "age_at_first_treatment_yrs|is_male")) %>%
filter(`CI (high)` < 50) %>%
arrange(`P-values`)
# Print
univariable_adjusted %>%
sable()| Predictors | Odds Ratios (OR) | SE | Z-scores | P-values | CI (low) | CI (high) |
|---|---|---|---|---|---|---|
| modified_rankin_score_final | 0.51 | 0.13 | -5.00299 | 0.000 | 0.40 | 0.67 |
| max_size_cm | 0.68 | 0.08 | -4.76743 | 0.000 | 0.58 | 0.79 |
| spetzler_martin_grade | 0.38 | 0.16 | -5.89898 | 0.000 | 0.28 | 0.53 |
| lawton_young_grade | 0.47 | 0.12 | -6.21220 | 0.000 | 0.37 | 0.59 |
| size_score | 0.32 | 0.24 | -4.79879 | 0.000 | 0.20 | 0.51 |
| is_eloquent_locationTRUE | 0.15 | 0.47 | -3.95835 | 0.000 | 0.06 | 0.39 |
| is_diffuse_nidusTRUE | 0.19 | 0.41 | -4.10478 | 0.000 | 0.08 | 0.42 |
| is_spetzler_martin_grade_less_than_4TRUE | 5.02 | 0.34 | 4.73994 | 0.000 | 2.58 | 9.78 |
| is_surgeryTRUE | 9.26 | 0.40 | 5.57769 | 0.000 | 4.23 | 20.23 |
| venous_drainageSuperficial | 4.80 | 0.43 | 3.61418 | 0.000 | 2.05 | 11.24 |
| has_hemorrhageTRUE | 2.84 | 0.34 | 3.10071 | 0.002 | 1.47 | 5.49 |
| locationDeep | 0.06 | 1.05 | -2.60427 | 0.009 | 0.01 | 0.51 |
| has_deficitTRUE | 0.44 | 0.32 | -2.59152 | 0.010 | 0.23 | 0.82 |
| has_deep_venous_drainageTRUE | 0.41 | 0.36 | -2.52344 | 0.012 | 0.20 | 0.82 |
| venous_drainageDeep | 2.79 | 0.42 | 2.46488 | 0.014 | 1.23 | 6.30 |
| modified_rankin_score_presentation | 1.24 | 0.10 | 2.18286 | 0.029 | 1.02 | 1.51 |
| has_associated_aneurysmTRUE | 0.58 | 0.38 | -1.41461 | 0.157 | 0.28 | 1.23 |
| locationHemispheric | 0.23 | 1.05 | -1.40092 | 0.161 | 0.03 | 1.80 |
| procedure_combinationsERS | 3.44 | 1.06 | 1.16326 | 0.245 | 0.43 | 27.54 |
| has_paresisTRUE | 0.72 | 0.36 | -0.93177 | 0.351 | 0.36 | 1.44 |
| procedure_combinationsR | 2.19 | 1.03 | 0.76328 | 0.445 | 0.29 | 16.36 |
| modified_rankin_score_postop_within_1_week | 0.91 | 0.12 | -0.72063 | 0.471 | 0.72 | 1.17 |
| has_seizuresTRUE | 0.78 | 0.36 | -0.69840 | 0.485 | 0.38 | 1.58 |
| modified_rankin_score_pretreatment | 1.07 | 0.12 | 0.54239 | 0.588 | 0.85 | 1.34 |
| procedure_combinationsER | 1.72 | 1.02 | 0.53156 | 0.595 | 0.23 | 12.85 |
Unadjusted - Recoded
Fit logistic.
# Define columns
cols <- unname(c(
EXPOSURES_CONTINUOUS,
EXPOSURES_BINARY,
EXPOSURES_CATEGORICAL
))
# Initialize
out <- list()
df <- df_multi
for (i in seq_along(cols)) {
# Create formula
model <- as.formula(paste(CHOSEN_OUTCOME, "~", cols[i]))
fit <- suppressWarnings(glm(
model,
data = df,
family = binomial()
))
# Calculate robust standard errors (sandwich)
robust_se <- sandwich::vcovHC(fit, type = "HC0")
# Compute robust confidence intervals
robust_ci <- lmtest::coeftest(fit, vcov. = robust_se)
# Summarize model coefficients
fit_summary <-
robust_ci %>%
# broom does not support exponentiation after coeftest so do it manually
broom::tidy(conf.int = T) %>%
mutate(across(c(estimate, "conf.low", "conf.high"), exp)) %>%
arrange(term == "(Intercept)", p.value) %>%
rename(odds_ratio = estimate, z_value = statistic)
# Stylize and print
stylized <-
fit_summary %>%
rename(
"Predictors" = term,
"Odds Ratios (OR)" = odds_ratio,
"SE" = std.error,
"Z-scores" = z_value,
"P-values" = p.value,
"CI (low)" = conf.low,
"CI (high)" = conf.high,
) %>%
mutate(
`Odds Ratios (OR)` = round(`Odds Ratios (OR)`, 2),
`SE` = round(`SE`, 2),
`CI (low)` = round(`CI (low)`, 2),
`CI (high)` = round(`CI (high)`, 2),
`P-values` = round(`P-values`, 3),
)
out[[i]] <- stylized
}Print results.
# Create table
univariable_unadjusted_recoded <-
out %>%
bind_rows() %>%
filter(Predictors != "(Intercept)") %>%
filter(`CI (high)` < 50) %>%
arrange(`P-values`)
# Print
univariable_unadjusted_recoded %>%
sable()| Predictors | Odds Ratios (OR) | SE | Z-scores | P-values | CI (low) | CI (high) |
|---|---|---|---|---|---|---|
| modified_rankin_score_final | 0.54 | 0.12 | -4.98381 | 0.000 | 0.42 | 0.69 |
| max_size_cm | 0.69 | 0.08 | -4.63358 | 0.000 | 0.58 | 0.80 |
| spetzler_martin_grade | 0.40 | 0.16 | -5.74497 | 0.000 | 0.29 | 0.55 |
| lawton_young_grade | 0.47 | 0.12 | -6.18704 | 0.000 | 0.37 | 0.60 |
| size_score | 0.33 | 0.23 | -4.66773 | 0.000 | 0.21 | 0.53 |
| is_eloquent_locationTRUE | 0.16 | 0.46 | -4.03430 | 0.000 | 0.06 | 0.38 |
| is_diffuse_nidusTRUE | 0.20 | 0.41 | -4.00540 | 0.000 | 0.09 | 0.44 |
| is_spetzler_martin_grade_less_than_4TRUE | 4.76 | 0.34 | 4.61668 | 0.000 | 2.46 | 9.24 |
| has_hemorrhageTRUE | 2.88 | 0.33 | 3.21157 | 0.001 | 1.51 | 5.48 |
| locationDeep | 0.09 | 0.77 | -3.05605 | 0.002 | 0.02 | 0.43 |
| venous_drainageSuperficial | 2.33 | 0.34 | 2.46962 | 0.014 | 1.19 | 4.57 |
| has_deep_venous_drainageTRUE | 0.44 | 0.34 | -2.40409 | 0.016 | 0.22 | 0.86 |
| has_deficitTRUE | 0.46 | 0.32 | -2.40666 | 0.016 | 0.24 | 0.86 |
| modified_rankin_score_presentation | 1.25 | 0.10 | 2.22691 | 0.026 | 1.03 | 1.52 |
| procedure_combinationsMultimodal | 5.02 | 0.94 | 1.72117 | 0.085 | 0.80 | 31.49 |
| locationHemispheric | 0.33 | 0.77 | -1.42543 | 0.154 | 0.07 | 1.51 |
| has_associated_aneurysmTRUE | 0.61 | 0.38 | -1.29670 | 0.195 | 0.29 | 1.28 |
| has_paresisTRUE | 0.74 | 0.35 | -0.83640 | 0.403 | 0.37 | 1.49 |
| procedure_combinationsR | 2.17 | 0.96 | 0.80291 | 0.422 | 0.33 | 14.31 |
| is_maleTRUE | 0.80 | 0.33 | -0.67700 | 0.498 | 0.42 | 1.52 |
| modified_rankin_score_pretreatment | 1.08 | 0.12 | 0.67581 | 0.499 | 0.86 | 1.36 |
| has_seizuresTRUE | 0.79 | 0.36 | -0.63845 | 0.523 | 0.39 | 1.61 |
| modified_rankin_score_postop_within_1_week | 0.93 | 0.13 | -0.54462 | 0.586 | 0.73 | 1.19 |
| age_at_first_treatment_yrs | 1.01 | 0.04 | 0.18340 | 0.854 | 0.93 | 1.09 |
Adjusted - Recoded
Fit logistic.
# Define columns
cols <- unname(c(
EXPOSURES_CONTINUOUS,
EXPOSURES_BINARY,
EXPOSURES_CATEGORICAL
))
# Initialize
out <- list()
df <- df_multi
for (i in seq_along(cols)) {
# Create formula
model <- as.formula(paste(
CHOSEN_OUTCOME,
"~",
"age_at_first_treatment_yrs + is_male +",
cols[i]
))
fit <- suppressWarnings(glm(
model,
data = df,
family = binomial()
))
# Calculate robust standard errors (sandwich)
robust_se <- sandwich::vcovHC(fit, type = "HC0")
# Compute robust confidence intervals
robust_ci <- lmtest::coeftest(fit, vcov. = robust_se)
# Summarize model coefficients
fit_summary <-
robust_ci %>%
# broom does not support exponentiation after coeftest so do it manually
broom::tidy(conf.int = T) %>%
mutate(across(c(estimate, "conf.low", "conf.high"), exp)) %>%
arrange(term == "(Intercept)", p.value) %>%
rename(odds_ratio = estimate, z_value = statistic)
# Stylize and print
stylized <-
fit_summary %>%
rename(
"Predictors" = term,
"Odds Ratios (OR)" = odds_ratio,
"SE" = std.error,
"Z-scores" = z_value,
"P-values" = p.value,
"CI (low)" = conf.low,
"CI (high)" = conf.high,
) %>%
mutate(
`Odds Ratios (OR)` = round(`Odds Ratios (OR)`, 2),
`SE` = round(`SE`, 2),
`CI (low)` = round(`CI (low)`, 2),
`CI (high)` = round(`CI (high)`, 2),
`P-values` = round(`P-values`, 3),
)
out[[i]] <- stylized
}Print results.
# Create table
univariable_adjusted_recoded <-
out %>%
bind_rows() %>%
filter(Predictors != "(Intercept)") %>%
filter(!str_starts(Predictors, "age_at_first_treatment_yrs|is_male")) %>%
filter(`CI (high)` < 50) %>%
arrange(`P-values`)
# Print
univariable_adjusted_recoded %>%
sable()| Predictors | Odds Ratios (OR) | SE | Z-scores | P-values | CI (low) | CI (high) |
|---|---|---|---|---|---|---|
| modified_rankin_score_final | 0.51 | 0.13 | -5.00299 | 0.000 | 0.40 | 0.67 |
| max_size_cm | 0.68 | 0.08 | -4.76743 | 0.000 | 0.58 | 0.79 |
| spetzler_martin_grade | 0.38 | 0.16 | -5.89898 | 0.000 | 0.28 | 0.53 |
| lawton_young_grade | 0.47 | 0.12 | -6.21220 | 0.000 | 0.37 | 0.59 |
| size_score | 0.32 | 0.24 | -4.79879 | 0.000 | 0.20 | 0.51 |
| is_eloquent_locationTRUE | 0.15 | 0.47 | -3.95835 | 0.000 | 0.06 | 0.39 |
| is_diffuse_nidusTRUE | 0.19 | 0.41 | -4.10478 | 0.000 | 0.08 | 0.42 |
| is_spetzler_martin_grade_less_than_4TRUE | 5.02 | 0.34 | 4.73994 | 0.000 | 2.58 | 9.78 |
| has_hemorrhageTRUE | 2.84 | 0.34 | 3.10071 | 0.002 | 1.47 | 5.49 |
| locationDeep | 0.04 | 1.05 | -2.98542 | 0.003 | 0.01 | 0.34 |
| has_deficitTRUE | 0.44 | 0.32 | -2.59152 | 0.010 | 0.23 | 0.82 |
| venous_drainageSuperficial | 2.51 | 0.35 | 2.59197 | 0.010 | 1.25 | 5.03 |
| has_deep_venous_drainageTRUE | 0.41 | 0.36 | -2.52344 | 0.012 | 0.20 | 0.82 |
| modified_rankin_score_presentation | 1.24 | 0.10 | 2.18286 | 0.029 | 1.02 | 1.51 |
| locationHemispheric | 0.16 | 1.05 | -1.77453 | 0.076 | 0.02 | 1.21 |
| procedure_combinationsMultimodal | 4.60 | 0.98 | 1.56510 | 0.118 | 0.68 | 31.14 |
| has_associated_aneurysmTRUE | 0.58 | 0.38 | -1.41461 | 0.157 | 0.28 | 1.23 |
| has_paresisTRUE | 0.72 | 0.36 | -0.93177 | 0.351 | 0.36 | 1.44 |
| procedure_combinationsR | 2.17 | 1.00 | 0.77563 | 0.438 | 0.31 | 15.48 |
| modified_rankin_score_postop_within_1_week | 0.91 | 0.12 | -0.72063 | 0.471 | 0.72 | 1.17 |
| has_seizuresTRUE | 0.78 | 0.36 | -0.69840 | 0.485 | 0.38 | 1.58 |
| modified_rankin_score_pretreatment | 1.07 | 0.12 | 0.54239 | 0.588 | 0.85 | 1.34 |
Interaction analysis
Adjusted - Recoded
Fit logistic.
# Define columns
cols <- unname(c(
EXPOSURES_CONTINUOUS,
EXPOSURES_BINARY,
EXPOSURES_CATEGORICAL
))
# Remove unwanted columns
unwanted <- c("modified_rankin_score_final")
cols <- cols[!cols %in% unwanted]
adjustors <- c(
"age_at_first_treatment_yrs",
"is_male"
)
# Initialize
out <- list()
df <- df_multi
k <- 1
for (i in seq_along(cols)) {
# Do not fit model for variables that we are adjusting for
if (cols[i] %in% adjustors) {
next
}
# Create formula
model <- as.formula(paste(
CHOSEN_OUTCOME,
"~",
"age_at_first_treatment_yrs + is_male +",
cols[i],
"* has_hemorrhage",
sep = ""
))
fit <- suppressWarnings(glm(
model,
data = df,
family = binomial()
))
# Calculate robust standard errors (sandwich)
robust_se <- sandwich::vcovHC(fit, type = "HC0")
# Compute robust confidence intervals
robust_ci <- lmtest::coeftest(fit, vcov. = robust_se)
# Summarize model coefficients
fit_summary <-
robust_ci %>%
# broom does not support exponentiation after coeftest so do it manually
broom::tidy(conf.int = T) %>%
mutate(across(c(estimate, "conf.low", "conf.high"), exp)) %>%
arrange(term == "(Intercept)", p.value) %>%
rename(odds_ratio = estimate, z_value = statistic)
# Stylize and print
stylized <-
fit_summary %>%
rename(
"Predictors" = term,
"Odds Ratios (OR)" = odds_ratio,
"SE" = std.error,
"Z-scores" = z_value,
"P-values" = p.value,
"CI (low)" = conf.low,
"CI (high)" = conf.high,
) %>%
mutate(
`Odds Ratios (OR)` = round(`Odds Ratios (OR)`, 2),
`SE` = round(`SE`, 2),
`CI (low)` = round(`CI (low)`, 2),
`CI (high)` = round(`CI (high)`, 2),
`P-values` = round(`P-values`, 3),
)
out[[k]] <- stylized
k <- k + 1
}Print all results results.
# Define values
unwanted <- c(
"is_maleTRUE",
"age_at_first_treatment_yrs",
"(Intercept)",
"has_hemorrhageTRUE"
)
# Initialize objects
isolated_values <- list()
# Get main and interaction effects
for (i in seq_along(out)) {
# Get data
result <- out[[i]]
# Get all main effects of interest
main_effects <-
result %>%
filter(!Predictors %in% unwanted) %>%
filter(!str_detect(Predictors, ":")) %>%
select(-"SE", -"Z-scores")
# Get the interaction effects
interaction_effects <-
result %>%
filter(!Predictors %in% unwanted) %>%
filter(str_detect(Predictors, ":")) %>%
mutate(Predictors = str_extract(Predictors, "^[^:]+")) %>%
select(-"SE", -"Z-scores") %>%
rename_with(~ paste("Interaction -", .), -Predictors)
isolated_values[[i]] <-
main_effects %>%
left_join(interaction_effects, by = "Predictors")
}Create and print table
# Create table
univariable_adjusted_recoded_interactions <-
isolated_values %>%
bind_rows() %>%
arrange(`Interaction - P-values`)
# Print
univariable_adjusted_recoded_interactions %>%
sable()| Predictors | Odds Ratios (OR) | P-values | CI (low) | CI (high) | Interaction - Odds Ratios (OR) | Interaction - P-values | Interaction - CI (low) | Interaction - CI (high) |
|---|---|---|---|---|---|---|---|---|
| is_eloquent_locationTRUE | 4.8000e-01 | 0.168 | 1.7000e-01 | 1.3600e+00 | 0.00 | 0.000 | 0.00 | 0.00 |
| locationDeep | 2.2000e-01 | 0.189 | 2.0000e-02 | 2.1100e+00 | 0.00 | 0.000 | 0.00 | 0.00 |
| locationHemispheric | 5.5000e-01 | 0.599 | 6.0000e-02 | 5.1400e+00 | 0.00 | 0.000 | 0.00 | 0.00 |
| has_seizuresTRUE | 1.5300e+00 | 0.402 | 5.7000e-01 | 4.1200e+00 | 0.29 | 0.089 | 0.07 | 1.21 |
| is_surgeryTRUE | 6.6434e+07 | 0.000 | 2.8505e+07 | 1.5483e+08 | 0.52 | 0.207 | 0.19 | 1.44 |
| size_score | 2.5000e-01 | 0.000 | 1.3000e-01 | 5.1000e-01 | 1.88 | 0.222 | 0.68 | 5.18 |
| has_paresisTRUE | 9.8000e-01 | 0.974 | 3.0000e-01 | 3.1600e+00 | 0.45 | 0.332 | 0.09 | 2.23 |
| max_size_cm | 6.4000e-01 | 0.000 | 5.0000e-01 | 8.2000e-01 | 1.17 | 0.355 | 0.84 | 1.63 |
| venous_drainageSuperficial | 3.8200e+00 | 0.006 | 1.4800e+00 | 9.8700e+00 | 0.61 | 0.502 | 0.15 | 2.57 |
| has_deep_venous_drainageTRUE | 2.8000e-01 | 0.009 | 1.1000e-01 | 7.2000e-01 | 1.53 | 0.565 | 0.36 | 6.43 |
| is_spetzler_martin_grade_less_than_4TRUE | 5.1700e+00 | 0.001 | 1.9700e+00 | 1.3560e+01 | 0.70 | 0.626 | 0.17 | 2.93 |
| modified_rankin_score_postop_within_1_week | 8.8000e-01 | 0.476 | 6.1000e-01 | 1.2600e+00 | 0.89 | 0.652 | 0.53 | 1.48 |
| modified_rankin_score_presentation | 9.9000e-01 | 0.976 | 6.5000e-01 | 1.5300e+00 | 1.11 | 0.703 | 0.66 | 1.85 |
| has_associated_aneurysmTRUE | 3.9000e-01 | 0.134 | 1.1000e-01 | 1.3400e+00 | 1.36 | 0.706 | 0.27 | 6.82 |
| is_diffuse_nidusTRUE | 1.7000e-01 | 0.004 | 5.0000e-02 | 5.8000e-01 | 1.24 | 0.800 | 0.23 | 6.63 |
| lawton_young_grade | 4.6000e-01 | 0.000 | 3.1000e-01 | 6.9000e-01 | 1.05 | 0.855 | 0.62 | 1.78 |
| has_deficitTRUE | 4.1000e-01 | 0.070 | 1.5000e-01 | 1.0800e+00 | 1.06 | 0.935 | 0.24 | 4.69 |
| modified_rankin_score_pretreatment | 9.1000e-01 | 0.706 | 5.6000e-01 | 1.4700e+00 | 1.01 | 0.970 | 0.58 | 1.77 |
| spetzler_martin_grade | 4.1000e-01 | 0.000 | 2.5000e-01 | 6.5000e-01 | 1.00 | 0.994 | 0.53 | 1.91 |
| procedure_combinationsMultimodal | 2.3406e+08 | 0.00 | ||||||
| procedure_combinationsR | 1.2091e+08 | 0.00 | ||||||
| procedure_combinationsS | 1.2031e+16 | 0.00 |
Eloquence.
df %>%
count(has_hemorrhage, is_eloquent_location, !!sym(CHOSEN_OUTCOME)) %>%
drop_na() %>%
arrange(has_hemorrhage, is_eloquent_location) %>%
group_by(has_hemorrhage, is_eloquent_location) %>%
mutate(
num_total = sum(n),
pct_total = scales::percent(n / num_total, 1)
) %>%
sable()| has_hemorrhage | is_eloquent_location | is_obliterated | n | num_total | pct_total |
|---|---|---|---|---|---|
| FALSE | FALSE | FALSE | 6 | 28 | 21% |
| FALSE | FALSE | TRUE | 22 | 28 | 79% |
| FALSE | TRUE | FALSE | 24 | 63 | 38% |
| FALSE | TRUE | TRUE | 39 | 63 | 62% |
| TRUE | FALSE | TRUE | 61 | 61 | 100% |
| TRUE | TRUE | FALSE | 20 | 76 | 26% |
| TRUE | TRUE | TRUE | 56 | 76 | 74% |
Deficit.
df %>%
count(has_hemorrhage, has_deficit, !!sym(CHOSEN_OUTCOME)) %>%
drop_na() %>%
arrange(has_hemorrhage, has_deficit) %>%
group_by(has_hemorrhage, has_deficit) %>%
mutate(
num_total = sum(n),
pct_total = scales::percent(n / num_total, 1)
) %>%
sable()| has_hemorrhage | has_deficit | is_obliterated | n | num_total | pct_total |
|---|---|---|---|---|---|
| FALSE | FALSE | FALSE | 15 | 57 | 26% |
| FALSE | FALSE | TRUE | 42 | 57 | 74% |
| FALSE | TRUE | FALSE | 15 | 34 | 44% |
| FALSE | TRUE | TRUE | 19 | 34 | 56% |
| TRUE | FALSE | FALSE | 9 | 86 | 10% |
| TRUE | FALSE | TRUE | 77 | 86 | 90% |
| TRUE | TRUE | FALSE | 11 | 51 | 22% |
| TRUE | TRUE | TRUE | 40 | 51 | 78% |
Seizures.
df %>%
count(has_hemorrhage, has_seizures, !!sym(CHOSEN_OUTCOME)) %>%
drop_na() %>%
arrange(has_hemorrhage, has_seizures) %>%
group_by(has_hemorrhage, has_seizures) %>%
mutate(
num_total = sum(n),
pct_total = scales::percent(n / num_total, 1)
) %>%
sable()| has_hemorrhage | has_seizures | is_obliterated | n | num_total | pct_total |
|---|---|---|---|---|---|
| FALSE | FALSE | FALSE | 23 | 64 | 36% |
| FALSE | FALSE | TRUE | 41 | 64 | 64% |
| FALSE | TRUE | FALSE | 7 | 27 | 26% |
| FALSE | TRUE | TRUE | 20 | 27 | 74% |
| TRUE | FALSE | FALSE | 13 | 108 | 12% |
| TRUE | FALSE | TRUE | 95 | 108 | 88% |
| TRUE | TRUE | FALSE | 7 | 29 | 24% |
| TRUE | TRUE | TRUE | 22 | 29 | 76% |
Selective inference
Read the following guides:
Setup
Define unwanted columns.
unwanted_all <- c(
# Use values at or before first treatment
"modified_rankin_score_postop_within_1_week",
"modified_rankin_score_final",
# Use the split between high vs. low grade instead of the granular
"spetzler_martin_grade"
)
# Remove hemorrhage from the mix
if (str_detect(params$TITLE, "Hemorrhage")) {
unwanted_all <- c(unwanted_all, "has_hemorrhage")
}
unwanted_without_gradings <- c(
unwanted_all,
# Spetzler-Martin grade (size score + eloquent location + venous drainage)
"is_spetzler_martin_grade_less_than_4",
"size_score", # Already covered by the more detailed max_size_cm
"venous_drainage", # Already covered by has_deep_venous_drainage
"location", # Already covered to some extent by eloquence
# Lawton-Young grade (SM + age + diffuse nidus + hemorrhage)
# (SM is heavily overlapping with LY, so include its components)
"lawton_young_grade",
# Use mRS pre-treatment (more predictive)
"modified_rankin_score_presentation" # "modified_rankin_score_pretreatment"
)
# Define cols of interest - use grading scores whenever these exist
unwanted_with_gradings <- c(
# Spetzler-Martin grade (size score + eloquent location + venous drainage)
"is_spetzler_martin_grade_less_than_4", # The grade has lower p-value
"size_score",
"max_size_cm",
"is_eloquent_location",
# "location", # Include this as not highly correlated with SM
"has_deep_venous_drainage",
"venous_drainage",
# Lawton-Young grade (SM + age + diffuse nidus + hemorrhage)
# (SM is heavily overlapping with LY, so include its components)
"lawton_young_grade",
# "is_diffuse_nidus",
# "has_hemorrhage",
# Use values at or before first treatment
"modified_rankin_score_final",
"modified_rankin_score_postop_within_1_week",
# Use mRS pre-treatment (more predictive)
"modified_rankin_score_presentation"
# "modified_rankin_score_pretreatment"
)Create dataset.
# Define cols
cols <- unname(c(
EXPOSURES_CONTINUOUS,
EXPOSURES_BINARY,
EXPOSURES_CATEGORICAL,
CHOSEN_OUTCOME
))
# Define column sets for each analysis
cols_all <- cols[!cols %in% unwanted_all]
cols_with_gradings <- cols[!cols %in% unwanted_with_gradings]
cols_without_gradings <- cols[!cols %in% unwanted_without_gradings]
# Create df of interest
df_all <-
# Use the non-recoded dataset
df_uni %>%
dplyr::select(all_of(cols_all)) %>%
drop_na()
df_with_gradings <-
# Use the non-recoded dataset
df_uni %>%
dplyr::select(all_of(cols_with_gradings)) %>%
drop_na()
df_no_scores <-
# Use the non-recoded dataset
df_uni %>%
dplyr::select(all_of(cols_without_gradings)) %>%
drop_na()
# Create formula
frla <- as.formula(paste(CHOSEN_OUTCOME, " ~ . - 1"))
# Create matrices with all variables
X_all <- model.matrix(frla, df_all)
y_all <- df_all %>%
pull(CHOSEN_OUTCOME) %>%
as.numeric()
# Create matrices with scores
X_with_gradings <- model.matrix(frla, df_with_gradings)
y_with_gradings <- df_with_gradings %>%
pull(CHOSEN_OUTCOME) %>%
as.numeric()
# Create matrices with score components
X_without_gradings <- model.matrix(frla, df_no_scores)
y_without_gradings <- df_no_scores %>%
pull(CHOSEN_OUTCOME) %>%
as.numeric()
# X should be centered for LASSO (necessary for glmnet)
# (scale = FALSE as this is done by the standardization step in glmnet)
X_all_scaled <- scale(X_all, center = T, scale = F)
X_with_gradings_scaled <- scale(X_with_gradings, center = T, scale = F)
X_without_gradings_scaled <- scale(X_without_gradings, center = T, scale = F)
# See the names of X to match column numbers to column names later on
colnames(X_without_gradings_scaled) [1] "age_at_first_treatment_yrs" "modified_rankin_score_pretreatment"
[3] "max_size_cm" "is_maleFALSE"
[5] "is_maleTRUE" "is_eloquent_locationTRUE"
[7] "has_deep_venous_drainageTRUE" "is_diffuse_nidusTRUE"
[9] "has_hemorrhageTRUE" "has_seizuresTRUE"
[11] "has_deficitTRUE" "has_paresisTRUE"
[13] "has_associated_aneurysmTRUE" "is_surgeryTRUE"
[15] "procedure_combinationsER" "procedure_combinationsERS"
[17] "procedure_combinationsES" "procedure_combinationsR"
[19] "procedure_combinationsRS" "procedure_combinationsS"
Stepwise
Use step-wise linear regression methods.
# Set seed
set.seed(33)
# Run forward step-wise
fsfit <- fs(
x = X_without_gradings_scaled,
y = y_without_gradings,
maxsteps = 2000,
intercept = TRUE,
normalize = FALSE
)
# Estimate sigma
sigmahat <- estimateSigma(
x = X_without_gradings_scaled,
y = y_without_gradings,
intercept = TRUE,
standardize = FALSE
)$sigmahat# Compute sequential p-values and confidence intervals - for all
# (sigma estimated from full model)
out.seq <- fsInf(fsfit, type = "active", sigma = sigmahat)
out.seq
Call:
fsInf(obj = fsfit, sigma = sigmahat, type = "active")
Standard deviation of noise (specified or estimated) sigma = 0.348
Sequential testing results with alpha = 0.100
Step Var Coef Z-score P-value LowConfPt UpConfPt LowTailArea UpTailArea
1 14 3.4100e-01 7.153 0.052 -5.0000e-03 4.0700e-01 0.050 0.049
2 3 -5.9000e-02 -5.794 0.000 -1.4500e-01 -4.4000e-02 0.050 0.050
3 11 -1.2400e-01 -2.546 0.194 -2.0300e-01 1.1200e-01 0.049 0.049
4 8 -1.3200e-01 -1.683 0.296 -5.5300e-01 3.0700e-01 0.050 0.050
5 5 -5.2000e-02 -1.101 1.000 4.6830e+00 Inf 0.000 0.000
6 9 5.1000e-02 1.029 1.000 -Inf -5.0030e+00 0.000 NaN
7 6 -5.4000e-02 -0.986 1.000 5.4860e+00 Inf NaN 0.000
8 17 6.7000e-02 0.936 1.000 -Inf -7.1590e+00 0.000 NaN
9 4 4.2012e+13 0.784 1.000 -Inf -5.3557e+15 0.000 0.000
10 18 4.5000e-02 0.599 1.000 -Inf -7.5710e+00 0.000 NaN
11 15 2.5200e-01 1.514 1.000 -Inf -1.6624e+01 0.000 NaN
12 1 2.0000e-03 0.358 1.000 -Inf -6.2400e-01 0.000 NaN
13 16 -3.0000e-02 -0.325 1.000 9.1210e+00 Inf NaN 0.000
14 7 6.0000e-03 0.117 1.000 -Inf -5.4510e+00 0.000 NaN
15 12 6.0000e-03 0.074 1.000 -Inf -8.2730e+00 0.000 NaN
16 20 -5.0000e-03 -0.052 1.000 9.7130e+00 Inf NaN 0.000
17 19 -5.1480e+13 -1.811 1.000 2.8423e+15 Inf 0.000 0.000
18 13 -6.0000e-03 -0.099 1.000 6.2280e+00 Inf 0.000 0.000
19 10 4.0000e-03 0.062 1.000 -Inf -5.7020e+00 0.000 0.000
20 2 1.0000e-03 0.049 1.000 -Inf -1.9970e+00 0.000 NaN
Estimated stopping point from ForwardStop rule = 3
# Compute p-values and confidence intervals after AIC stopping - for selected
out.aic <- fsInf(fsfit, type = "aic", sigma = sigmahat)
out.aic
Call:
fsInf(obj = fsfit, sigma = sigmahat, type = "aic")
Standard deviation of noise (specified or estimated) sigma = 0.348
Testing results at step = 4, with alpha = 0.100
Var Coef Z-score P-value LowConfPt UpConfPt LowTailArea UpTailArea
14 0.298 6.178 1 -Inf -4.816 0 0
3 -0.046 -3.927 1 1.176 Inf NaN 0
11 -0.113 -2.314 1 4.901 Inf NaN 0
8 -0.132 -1.683 1 7.847 Inf NaN 0
Estimated stopping point from AIC rule = 4
# Compute p-values and confidence intervals after 5 fixed steps
out.fix <- fsInf(fsfit, type = "all", k = 5, sigma = sigmahat)
out.fix
Call:
fsInf(obj = fsfit, sigma = sigmahat, k = 5, type = "all")
Standard deviation of noise (specified or estimated) sigma = 0.348
Testing results at step = 5, with alpha = 0.100
Var Coef Z-score P-value LowConfPt UpConfPt LowTailArea UpTailArea
14 0.298 6.178 1 -Inf -4.816 0 0
3 -0.046 -3.928 1 1.176 Inf NaN 0
11 -0.113 -2.307 1 4.902 Inf NaN 0
8 -0.134 -1.701 1 7.848 Inf 0 0
5 -0.052 -1.101 1 4.683 Inf 0 0
LASSO - all
Use LASSO logistic regression using all available variables.
# Set seed
set.seed(141845)
# Define values
X <- X_all
y <- y_all
# Perform cross-validation
# (alpha = 1 for lasso, alpha = 0 for ridge, 0 < alpha < 1 for elastic net)
cv_fit <- cv.glmnet(
x = X,
y = y,
family = "binomial",
alpha = 1,
nfolds = 10
)
# Extract the best lambda
best_lambda_min <- cv_fit$lambda.min
best_lambda_1se <- cv_fit$lambda.1se
# Run glmnet
gfit <- glmnet(
x = X,
y = y,
family = "binomial",
alpha = 1,
intercept = TRUE,
standardize = TRUE
)
# Extract coef for a given lambda - avoid the intercept term
# (given the small number of observations hence large SD, use min lambda)
lambda <- best_lambda_min
n <- nrow(y)
beta <- coef(
x = X,
y = y,
object = gfit,
s = lambda,
exact = TRUE
)
# Estimate sigma
gsigmahat <- estimateSigma(
x = X,
y = y,
intercept = TRUE,
standardize = TRUE
)$sigmahat
# Compute fixed lambda p-values and selection intervals
out <- fixedLassoInf(
x = X,
y = y,
beta = beta,
lambda = lambda,
# Level of significance
alpha = 0.1,
family = "binomial",
intercept = TRUE,
sigma = gsigmahat
)
# Print model (make this robust to no variable being selected)
if (length(out) > 0) {
tibble(
names = names(out$vars),
odds_ratio = exp(out$coef0),
pvalue = out$pv,
ci_lo = exp(out$ci[, 1]),
ci_hi = exp(out$ci[, 2])
) %>%
arrange(pvalue) %>%
sable()
} else {
print("No variables were selected")
}| names | odds_ratio | pvalue | ci_lo | ci_hi |
|---|---|---|---|---|
| has_deficitTRUE | 0.47156 | 0.12472 | 0.22613 | 1.4425e+00 |
| lawton_young_grade | 0.78074 | 0.12708 | 0.11757 | 1.2831e+00 |
| venous_drainageDeep | 2.59309 | 0.14339 | 0.56626 | 5.4558e+00 |
| max_size_cm | 0.86935 | 0.23075 | 0.64812 | 1.2105e+00 |
| is_surgeryTRUE | 7.70326 | 0.23918 | 0.00148 | 1.0548e+08 |
| procedure_combinationsERS | 0.54856 | 0.37549 | 0.00000 | 3.0254e+04 |
| locationDeep | 0.56829 | 0.38849 | 0.24631 | 5.2677e+00 |
| locationCorpus Callosum | 5.61645 | 0.44418 | 0.00231 | 4.1000e+01 |
| procedure_combinationsS | 4.22752 | 0.57310 | 0.00000 | 8.1359e+03 |
| procedure_combinationsES | 2.34490 | 0.70317 | 0.00000 | 3.0948e+03 |
| is_maleFALSE | 1.51025 | 0.72140 | 0.00704 | 2.3201e+00 |
| is_diffuse_nidusTRUE | 0.79325 | 0.87127 | 0.51135 | 4.5430e+05 |
LASSO - without scores
Use LASSO logistic regression not based on previous gradings.
# Set seed
set.seed(141845)
# Define values
X <- X_without_gradings
y <- y_without_gradings
# Perform cross-validation
# (alpha = 1 for lasso, alpha = 0 for ridge, 0 < alpha < 1 for elastic net)
cv_fit <- cv.glmnet(
x = X,
y = y,
family = "binomial",
alpha = 1,
nfolds = 10
)
# Extract the best lambda
best_lambda_min <- cv_fit$lambda.min
best_lambda_1se <- cv_fit$lambda.1se
# Run glmnet
gfit <- glmnet(
x = X,
y = y,
family = "binomial",
alpha = 1,
intercept = TRUE,
standardize = TRUE
)
# Extract coef for a given lambda - avoid the intercept term
# (given the small number of observations hence large SD, use min lambda)
lambda <- best_lambda_min
n <- nrow(y)
beta <- coef(
x = X,
y = y,
object = gfit,
s = lambda,
exact = TRUE
)
# Estimate sigma
gsigmahat <- estimateSigma(
x = X,
y = y,
intercept = TRUE,
standardize = TRUE
)$sigmahat
# Compute fixed lambda p-values and selection intervals
out <- fixedLassoInf(
x = X,
y = y,
beta = beta,
lambda = lambda,
# Level of significance
alpha = 0.1,
family = "binomial",
intercept = TRUE,
sigma = gsigmahat
)
# Print model (make this robust to no variable being selected)
if (length(out) > 0) {
tibble(
names = names(out$vars),
odds_ratio = exp(out$coef0),
pvalue = out$pv,
ci_lo = exp(out$ci[, 1]),
ci_hi = exp(out$ci[, 2])
) %>%
arrange(pvalue) %>%
sable()
} else {
print("No variables were selected")
}| names | odds_ratio | pvalue | ci_lo | ci_hi |
|---|---|---|---|---|
| max_size_cm | 0.80106 | 0.01944 | 0.65747 | 9.5181e-01 |
| has_deficitTRUE | 0.45615 | 0.08523 | 0.23516 | 1.2026e+00 |
| is_surgeryTRUE | 8.17566 | 0.11092 | 0.11722 | 2.2923e+10 |
| is_eloquent_locationTRUE | 0.47130 | 0.20210 | 0.20176 | 2.1621e+00 |
| procedure_combinationsERS | 0.47423 | 0.21252 | 0.00000 | 1.7332e+02 |
| is_diffuse_nidusTRUE | 0.62299 | 0.43902 | 0.26334 | 6.6364e+00 |
| has_hemorrhageTRUE | 1.47309 | 0.51406 | 0.13455 | 2.6871e+00 |
| procedure_combinationsS | 3.73946 | 0.70483 | 0.00000 | 2.3150e+02 |
| is_maleFALSE | 1.37816 | 0.72311 | 0.01226 | 2.3395e+00 |
| procedure_combinationsES | 2.06932 | 0.81869 | 0.00000 | 6.8686e+01 |
LASSO - without components
Use LASSO logistic regression based on previous gradings.
# Set seed
set.seed(141845)
# Define values
X <- X_with_gradings
y <- y_with_gradings
# Perform cross-validation
# (alpha = 1 for lasso, alpha = 0 for ridge, 0 < alpha < 1 for elastic net)
cv_fit <- cv.glmnet(
x = X,
y = y,
family = "binomial",
alpha = 1,
nfolds = 10
)
# Extract the best lambda
best_lambda_min <- cv_fit$lambda.min
best_lambda_1se <- cv_fit$lambda.1se
# Run glmnet
gfit <- glmnet(
x = X,
y = y,
family = "binomial",
alpha = 1,
intercept = TRUE,
standardize = TRUE
)
# Extract coef for a given lambda - avoid the intercept term
# (given the small number of observations hence large SD, use min lambda)
lambda <- best_lambda_min
n <- nrow(y)
beta <- coef(
x = X,
y = y,
object = gfit,
s = lambda,
exact = TRUE
)
# Estimate sigma
gsigmahat <- estimateSigma(
x = X,
y = y,
intercept = TRUE,
standardize = TRUE
)$sigmahat
# Compute fixed lambda p-values and selection intervals
out <- fixedLassoInf(
x = X,
y = y,
beta = beta,
lambda = lambda,
# Level of significance
alpha = 0.1,
family = "binomial",
intercept = TRUE,
sigma = gsigmahat
)
# Print model (make this robust to no variable being selected)
if (length(out) > 0) {
tibble(
names = names(out$vars),
odds_ratio = exp(out$coef0),
pvalue = out$pv,
ci_lo = exp(out$ci[, 1]),
ci_hi = exp(out$ci[, 2])
) %>%
arrange(pvalue) %>%
sable()
} else {
print("No variables were selected")
}| names | odds_ratio | pvalue | ci_lo | ci_hi |
|---|---|---|---|---|
| spetzler_martin_grade | 0.57068 | 0.01298 | 0.29968 | 8.5037e-01 |
| is_surgeryTRUE | 11.64518 | 0.07239 | 0.64462 | 1.6585e+03 |
| has_deficitTRUE | 0.43738 | 0.12524 | 0.23325 | 1.4960e+00 |
| is_diffuse_nidusTRUE | 0.60971 | 0.24588 | 0.01107 | 4.5366e+00 |
| procedure_combinationsERS | 0.34890 | 0.33449 | 0.00336 | 3.3700e+01 |
| locationCorpus Callosum | 7.89072 | 0.40014 | 0.00332 | 5.1507e+01 |
| has_hemorrhageTRUE | 1.44913 | 0.44167 | 0.18339 | 4.7580e+00 |
| is_maleFALSE | 1.38509 | 0.55115 | 0.00451 | 4.9059e+01 |
| procedure_combinationsS | 2.07997 | 0.76280 | 0.00000 | 1.8295e+01 |
| has_seizuresTRUE | 0.75036 | 0.94815 | 0.96910 | 7.3753e+13 |
Multivariable - Without scores
Feature selection
Create predictors.
# Define p-value threshold
pval_threshold <- 0.2
# Get the predictors of interest
predictors <-
univariable_adjusted_recoded %>%
bind_rows(univariable_unadjusted_recoded) %>%
filter(`P-values` < pval_threshold) %>%
pull(`Predictors`)
# Clean categorical variables
predictors <-
predictors %>%
str_replace("^([a-z0-9_]*)([A-Z].*)$", "\\1") %>%
unique()
# Add adjustment variables
predictors <- unique(c(predictors))
# Remove unwanted predictors
predictors <- predictors[!predictors %in% unwanted_without_gradings]
# Print
print(predictors)[1] "max_size_cm" "is_eloquent_location" "is_diffuse_nidus"
[4] "has_hemorrhage" "has_deficit" "has_deep_venous_drainage"
[7] "procedure_combinations" "has_associated_aneurysm"
Model selection
Fit logistic.
# Define data
df <- df_multi %>% drop_na(modified_rankin_score_pretreatment, location)
# Create formula
model <- as.formula(paste(
CHOSEN_OUTCOME,
"~",
paste(predictors, collapse = " + ")
))
fit <- glm(
model,
data = df,
family = binomial()
)
# Save
fit_without_grading <- fitPrint results.
# Summarize model coefficients
fit_summary <-
fit %>%
broom::tidy(exponentiate = T, conf.int = T) %>%
arrange(term == "(Intercept)", p.value) %>%
rename(odds_ratio = estimate, z_value = statistic)
# Stylize and print
stylized <-
fit_summary %>%
rename(
"Predictors" = term,
"Odds Ratios (OR)" = odds_ratio,
"SE" = std.error,
"Z-scores" = z_value,
"P-values" = p.value,
"CI (low)" = conf.low,
"CI (high)" = conf.high,
) %>%
mutate(
`Odds Ratios (OR)` = round(`Odds Ratios (OR)`, 2),
`SE` = round(`SE`, 2),
`CI (low)` = round(`CI (low)`, 2),
`CI (high)` = round(`CI (high)`, 2),
`P-values` = round(`P-values`, 3),
)
# Print
stylized %>% sable()| Predictors | Odds Ratios (OR) | SE | Z-scores | P-values | CI (low) | CI (high) |
|---|---|---|---|---|---|---|
| max_size_cm | 7.9000e-01 | 0.10 | -2.49629 | 0.013 | 0.65 | 9.5000e-01 |
| procedure_combinationsMultimodal | 1.2830e+01 | 1.11 | 2.30563 | 0.021 | 1.50 | 1.3568e+02 |
| is_eloquent_locationTRUE | 3.5000e-01 | 0.56 | -1.89264 | 0.058 | 0.10 | 9.7000e-01 |
| procedure_combinationsR | 4.7500e+00 | 1.15 | 1.35326 | 0.176 | 0.51 | 5.4360e+01 |
| has_hemorrhageTRUE | 1.7200e+00 | 0.42 | 1.30855 | 0.191 | 0.77 | 3.9300e+00 |
| has_deficitTRUE | 6.4000e-01 | 0.40 | -1.10472 | 0.269 | 0.29 | 1.4100e+00 |
| is_diffuse_nidusTRUE | 6.1000e-01 | 0.54 | -0.90509 | 0.365 | 0.21 | 1.8100e+00 |
| has_deep_venous_drainageTRUE | 7.6000e-01 | 0.44 | -0.61510 | 0.538 | 0.32 | 1.7900e+00 |
| has_associated_aneurysmTRUE | 8.2000e-01 | 0.51 | -0.39268 | 0.695 | 0.31 | 2.2800e+00 |
| procedure_combinationsS | 3.0835e+08 | 1426.07 | 0.01371 | 0.989 | 0.00 | 4.9623e+224 |
| (Intercept) | 2.3000e+00 | 1.19 | 0.70056 | 0.484 | 0.20 | 2.5580e+01 |
Inference
Create robust CIs using the sandwich estimator.
# Calculate robust standard errors
robust_se <- sandwich::vcovHC(fit, type = "HC0")
# Compute robust confidence intervals
robust_ci <- lmtest::coeftest(fit, vcov. = robust_se)Stylize results.
# Summarize model coefficients
fit_summary <-
robust_ci %>%
# broom does not support exponentiation after coeftest so do it manually
broom::tidy(exponentiate = F, conf.int = T) %>%
mutate(across(c(estimate, "conf.low", "conf.high"), exp)) %>%
arrange(term == "(Intercept)", p.value) %>%
rename(odds_ratio = estimate, z_value = statistic)
# Stylize and print
multivariable_pvalue <-
fit_summary %>%
rename(
"Predictors" = term,
"Odds Ratios (OR)" = odds_ratio,
"SE" = std.error,
"Z-scores" = z_value,
"P-values" = p.value,
"CI (low)" = conf.low,
"CI (high)" = conf.high,
) %>%
mutate(
`Odds Ratios (OR)` = round(`Odds Ratios (OR)`, 2),
`SE` = round(`SE`, 2),
`CI (low)` = round(`CI (low)`, 2),
`CI (high)` = round(`CI (high)`, 2),
`P-values` = round(`P-values`, 3),
)
# Print
multivariable_pvalue %>% sable()| Predictors | Odds Ratios (OR) | SE | Z-scores | P-values | CI (low) | CI (high) |
|---|---|---|---|---|---|---|
| procedure_combinationsS | 3.0835e+08 | 0.80 | 24.58252 | 0.000 | 6.4896e+07 | 1.4652e+09 |
| procedure_combinationsMultimodal | 1.2830e+01 | 0.82 | 3.10270 | 0.002 | 2.5600e+00 | 6.4350e+01 |
| max_size_cm | 7.9000e-01 | 0.10 | -2.38843 | 0.017 | 6.5000e-01 | 9.6000e-01 |
| is_eloquent_locationTRUE | 3.5000e-01 | 0.56 | -1.89442 | 0.058 | 1.2000e-01 | 1.0400e+00 |
| procedure_combinationsR | 4.7500e+00 | 0.88 | 1.77739 | 0.076 | 8.5000e-01 | 2.6460e+01 |
| has_hemorrhageTRUE | 1.7200e+00 | 0.42 | 1.28156 | 0.200 | 7.5000e-01 | 3.9500e+00 |
| has_deficitTRUE | 6.4000e-01 | 0.38 | -1.14702 | 0.251 | 3.0000e-01 | 1.3700e+00 |
| is_diffuse_nidusTRUE | 6.1000e-01 | 0.56 | -0.87372 | 0.382 | 2.0000e-01 | 1.8400e+00 |
| has_deep_venous_drainageTRUE | 7.6000e-01 | 0.47 | -0.57248 | 0.567 | 3.0000e-01 | 1.9200e+00 |
| has_associated_aneurysmTRUE | 8.2000e-01 | 0.49 | -0.40306 | 0.687 | 3.1000e-01 | 2.1600e+00 |
| (Intercept) | 2.3000e+00 | 1.00 | 0.83504 | 0.404 | 3.3000e-01 | 1.6320e+01 |
Plot the coefficients with their confidence intervals.
robust_ci %>%
# broom does not support exponentiation after coeftest so do it manually
broom::tidy(exponentiate = F, conf.int = T) %>%
mutate(across(c(estimate, "conf.low", "conf.high"), exp)) %>%
filter(term != "(Intercept)") %>%
arrange(desc(estimate)) %>%
mutate(term = factor(term, levels = term)) %>%
ggplot(aes(x = estimate, y = term, color = term)) +
geom_pointrange(aes(xmin = conf.low, xmax = conf.high)) +
geom_vline(xintercept = 1, linetype = 2, alpha = 0.5) +
labs(x = "Odds ratio", y = NULL) +
guides(color = F) +
theme(
axis.text = element_text(size = 12),
axis.title = element_text(size = 13)
)Predict
Predict.
Plot histogram of predictions.
tibble(
Predictions = predicted_probs,
Truth = fit$model[, CHOSEN_OUTCOME]
) %>%
pivot_longer(
cols = everything(),
names_to = "keys",
values_to = "values"
) %>%
mutate(keys = fct_inorder(keys)) %>%
ggplot(aes(x = values, color = keys, fill = keys)) +
geom_histogram(alpha = 0.7) +
geom_vline(xintercept = 0.5, linetype = 2, alpha = 0.5) +
facet_wrap(vars(keys), ncol = 1, scales = "fixed") +
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text = element_text(size = 12),
strip.text.x = element_text(size = 13),
legend.position = "none"
)Evaluate
How many examples were used?
# All examples
num_all_examples <- nrow(fit$data)
num_all_examples_positive <- sum(fit$data[, CHOSEN_OUTCOME], na.rm = T)
num_complete_cases <- nrow(fit$model)
num_complete_cases_positive <- sum(fit$model[, CHOSEN_OUTCOME], na.rm = T)
# Print
sprintf(
"
All examples = %s (%s with outcome)
Fitted examples = %s (%s with outcome)",
num_all_examples,
num_all_examples_positive,
num_complete_cases,
num_complete_cases_positive
) %>% cat()
All examples = 225 (176 with outcome)
Fitted examples = 223 (175 with outcome)
Residual analysis.
Pseudo R-squared.
fitting null model for pseudo-r2
llh llhNull G2 McFadden r2ML r2CU
-81.92613 -116.14411 68.43596 0.29462 0.26427 0.40837
ROC-AUC.
# Compute
auroc <- cvAUC::ci.cvAUC(
predictions = predicted_probs,
labels = fit$model[, CHOSEN_OUTCOME]
)
# Print
sprintf(
"Cross-validated AUROC = %.3f (SE, %.3f; %s%% CI, %.3f-%.3f)",
auroc$cvAUC,
auroc$se,
auroc$confidence * 100,
auroc$ci[1],
auroc$ci[2]
)[1] "Cross-validated AUROC = 0.847 (SE, 0.028; 95% CI, 0.792-0.902)"
PR-AUC.
# Construct object
precrec_obj_tr <- precrec::evalmod(
scores = predicted_probs,
labels = fit$model[, CHOSEN_OUTCOME],
calc_avg = F,
cb_alpha = 0.05
)
# Print
# NOTE: Can use precrec::auc_ci to get CIs, but need to have multiple datasets
precrec_obj_tr %>%
precrec::part() %>%
precrec::pauc() %>%
sable()| modnames | dsids | curvetypes | paucs | spaucs |
|---|---|---|---|---|
| m1 | 1 | ROC | 0.84702 | 0.84702 |
| m1 | 1 | PRC | 0.95436 | 0.95436 |
ROC and PR curves.
Another version of the ROC curve.
# Calculate ROC curve
roc_curve <- pROC::roc(fit$model[, CHOSEN_OUTCOME] ~ predicted_probs)
# Plot ROC curve
plot(roc_curve)Area under the curve: 0.847
Plot all performance measures.
# Construct object
precrec_obj_tr2 <- precrec::evalmod(
scores = predicted_probs,
labels = fit$model[, CHOSEN_OUTCOME],
mode = "basic"
)
# Plot
precrec_obj_tr2 %>% autoplot()Confusion matrix.
Kappa statistic: 0 = No agreement; 0.1-0.2 = Slight agreement; 0.2-0.4 = Fair agreement; 0.4-0.6 = Moderate agreement; 0.6-0.8 = Substantial agreement; 0.8 - 1 = Near perfect agreement
No Information Rate (NIR): The proportion of the largest observed class - for example, if we had 35 patients and 23 patients had the outcome, this is the largest class and the NIR is 23/35 = 0.657. The larger the deviation of Accuracy from NIR the more confident we are that the model is not just choosing the largest class.
# Scores
pred <- factor(ifelse(predicted_probs > 0.5, "Event", "No Event"))
truth <- factor(ifelse(fit$model[, CHOSEN_OUTCOME], "Event", "No Event"))
# Count values per category
xtab <- table(
scores = pred,
labels = truth
)
# Compute
confusion_matrix <-
caret::confusionMatrix(
xtab,
positive = "Event",
prevalence = NULL, # Provide prevalence as a proportion here if you want
mode = "sens_spec"
)
# Print
confusion_matrixConfusion Matrix and Statistics
labels
scores Event No Event
Event 165 27
No Event 10 21
Accuracy : 0.834
95% CI : (0.779, 0.88)
No Information Rate : 0.785
P-Value [Acc > NIR] : 0.04050
Kappa : 0.436
Mcnemar's Test P-Value : 0.00853
Sensitivity : 0.943
Specificity : 0.438
Pos Pred Value : 0.859
Neg Pred Value : 0.677
Prevalence : 0.785
Detection Rate : 0.740
Detection Prevalence : 0.861
Balanced Accuracy : 0.690
'Positive' Class : Event
Variable importance
Plot importance (based on magnitude of z-score).
# Calculate importance
varimp <-
fit %>%
caret::varImp() %>%
as_tibble(rownames = "rn") %>%
mutate(Predictor = factor(rn) %>% fct_reorder(Overall)) %>%
dplyr::select(Predictor, Importance = Overall)
# Plot variable importance
varimp %>%
ggplot(aes(Predictor, Importance, fill = Importance)) +
geom_bar(stat = "identity", alpha = 0.9, width = 0.65) +
coord_flip() +
guides(fill = F) +
labs(y = "\nVariable Importance", x = NULL) +
scale_fill_viridis_c() +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
panel.border = element_blank(),
axis.text.x = element_text(margin = margin(t = 7), size = 12),
axis.text.y = element_text(margin = margin(r = -10), size = 12),
axis.title = element_text(size = 13)
)Multivariable - Without components
Feature selection
Create predictors.
# Define p-value threshold
pval_threshold <- 0.2
# Get the predictors of interest
predictors <-
univariable_unadjusted_recoded %>%
bind_rows(univariable_adjusted) %>%
filter(`P-values` < pval_threshold) %>%
pull(`Predictors`)
# Clean categorical variables
predictors <-
predictors %>%
str_replace("^([a-z0-9_]*)([A-Z].*)$", "\\1") %>%
unique()
# Add adjustment variables
predictors <- unique(c(predictors))
# Remove unwanted predictors
predictors <- predictors[!predictors %in% unwanted_with_gradings]
# Print
print(predictors)[1] "spetzler_martin_grade" "is_diffuse_nidus" "has_hemorrhage"
[4] "location" "has_deficit" "procedure_combinations"
[7] "has_associated_aneurysm" "is_surgery"
Model selection
Fit logistic.
# Define data
df <- df_multi
# Create formula
model <- as.formula(paste(
CHOSEN_OUTCOME,
"~",
paste(predictors, collapse = " + ")
))
fit <- glm(
model,
data = df,
family = binomial()
)
fit_with_grading <- fitPrint results.
# Summarize model coefficients
fit_summary <-
fit %>%
broom::tidy(exponentiate = T, conf.int = T) %>%
arrange(term == "(Intercept)", p.value) %>%
rename(odds_ratio = estimate, z_value = statistic)
# Stylize and print
stylized <-
fit_summary %>%
rename(
"Predictors" = term,
"Odds Ratios (OR)" = odds_ratio,
"SE" = std.error,
"Z-scores" = z_value,
"P-values" = p.value,
"CI (low)" = conf.low,
"CI (high)" = conf.high,
) %>%
mutate(
`Odds Ratios (OR)` = round(`Odds Ratios (OR)`, 2),
`SE` = round(`SE`, 2),
`CI (low)` = round(`CI (low)`, 2),
`CI (high)` = round(`CI (high)`, 2),
`P-values` = round(`P-values`, 3),
)
# Print
stylized %>% sable()| Predictors | Odds Ratios (OR) | SE | Z-scores | P-values | CI (low) | CI (high) |
|---|---|---|---|---|---|---|
| spetzler_martin_grade | 5.0000e-01 | 0.24 | -2.89399 | 0.004 | 0.31 | 7.9000e-01 |
| procedure_combinationsMultimodal | 1.9890e+01 | 1.34 | 2.23269 | 0.026 | 1.68 | 3.6527e+02 |
| procedure_combinationsR | 7.8800e+00 | 1.37 | 1.51158 | 0.131 | 0.62 | 1.4862e+02 |
| locationDeep | 2.7000e-01 | 0.88 | -1.50208 | 0.133 | 0.03 | 1.2500e+00 |
| has_hemorrhageTRUE | 1.8000e+00 | 0.41 | 1.45140 | 0.147 | 0.82 | 4.0400e+00 |
| locationHemispheric | 2.9000e-01 | 0.86 | -1.42659 | 0.154 | 0.04 | 1.3100e+00 |
| has_deficitTRUE | 6.2000e-01 | 0.41 | -1.17787 | 0.239 | 0.27 | 1.3900e+00 |
| is_diffuse_nidusTRUE | 6.2000e-01 | 0.54 | -0.87629 | 0.381 | 0.22 | 1.8200e+00 |
| has_associated_aneurysmTRUE | 7.4000e-01 | 0.50 | -0.58705 | 0.557 | 0.28 | 2.0400e+00 |
| procedure_combinationsS | 4.2928e+08 | 1410.71 | 0.01409 | 0.989 | 0.00 | 2.8936e+218 |
| is_surgeryTRUE | ||||||
| (Intercept) | 6.6200e+00 | 1.48 | 1.27554 | 0.202 | 0.41 | 1.6071e+02 |
Inference
Create robust CIs using the sandwich estimator.
# Calculate robust standard errors
robust_se <- sandwich::vcovHC(fit, type = "HC0")
# Compute robust confidence intervals
robust_ci <- lmtest::coeftest(fit, vcov. = robust_se)Stylize results.
# Summarize model coefficients
fit_summary <-
robust_ci %>%
# broom does not support exponentiation after coeftest so do it manually
broom::tidy(exponentiate = F, conf.int = T) %>%
mutate(across(c(estimate, "conf.low", "conf.high"), exp)) %>%
arrange(term == "(Intercept)", p.value) %>%
rename(odds_ratio = estimate, z_value = statistic)
# Stylize and print
multivariable_pvalue <-
fit_summary %>%
rename(
"Predictors" = term,
"Odds Ratios (OR)" = odds_ratio,
"SE" = std.error,
"Z-scores" = z_value,
"P-values" = p.value,
"CI (low)" = conf.low,
"CI (high)" = conf.high,
) %>%
mutate(
`Odds Ratios (OR)` = round(`Odds Ratios (OR)`, 2),
`SE` = round(`SE`, 2),
`CI (low)` = round(`CI (low)`, 2),
`CI (high)` = round(`CI (high)`, 2),
`P-values` = round(`P-values`, 3),
)
# Print
multivariable_pvalue %>% sable()| Predictors | Odds Ratios (OR) | SE | Z-scores | P-values | CI (low) | CI (high) |
|---|---|---|---|---|---|---|
| procedure_combinationsS | 4.2928e+08 | 0.81 | 24.54652 | 0.000 | 8.7789e+07 | 2.0991e+09 |
| procedure_combinationsMultimodal | 1.9890e+01 | 0.84 | 3.56475 | 0.000 | 3.8400e+00 | 1.0299e+02 |
| spetzler_martin_grade | 5.0000e-01 | 0.23 | -3.00684 | 0.003 | 3.2000e-01 | 7.9000e-01 |
| procedure_combinationsR | 7.8800e+00 | 0.91 | 2.27989 | 0.023 | 1.3400e+00 | 4.6460e+01 |
| locationDeep | 2.7000e-01 | 0.86 | -1.53515 | 0.125 | 5.0000e-02 | 1.4400e+00 |
| has_hemorrhageTRUE | 1.8000e+00 | 0.40 | 1.46135 | 0.144 | 8.2000e-01 | 3.9700e+00 |
| locationHemispheric | 2.9000e-01 | 0.85 | -1.45986 | 0.144 | 6.0000e-02 | 1.5300e+00 |
| has_deficitTRUE | 6.2000e-01 | 0.41 | -1.17976 | 0.238 | 2.7000e-01 | 1.3800e+00 |
| is_diffuse_nidusTRUE | 6.2000e-01 | 0.53 | -0.88779 | 0.375 | 2.2000e-01 | 1.7700e+00 |
| has_associated_aneurysmTRUE | 7.4000e-01 | 0.50 | -0.58700 | 0.557 | 2.8000e-01 | 2.0000e+00 |
| (Intercept) | 6.6200e+00 | 1.07 | 1.77459 | 0.076 | 8.2000e-01 | 5.3380e+01 |
Plot the coefficients with their confidence intervals.
robust_ci %>%
# broom does not support exponentiation after coeftest so do it manually
broom::tidy(exponentiate = F, conf.int = T) %>%
mutate(across(c(estimate, "conf.low", "conf.high"), exp)) %>%
filter(term != "(Intercept)") %>%
arrange(desc(estimate)) %>%
mutate(term = factor(term, levels = term)) %>%
ggplot(aes(x = estimate, y = term, color = term)) +
geom_pointrange(aes(xmin = conf.low, xmax = conf.high)) +
geom_vline(xintercept = 1, linetype = 2, alpha = 0.5) +
labs(x = "Odds ratio", y = NULL) +
guides(color = F) +
theme(
axis.text = element_text(size = 12),
axis.title = element_text(size = 13)
)Predict
Predict.
Plot histogram of predictions.
tibble(
Predictions = predicted_probs,
Truth = fit$model[, CHOSEN_OUTCOME]
) %>%
pivot_longer(
cols = everything(),
names_to = "keys",
values_to = "values"
) %>%
mutate(keys = fct_inorder(keys)) %>%
ggplot(aes(x = values, color = keys, fill = keys)) +
geom_histogram(alpha = 0.7) +
geom_vline(xintercept = 0.5, linetype = 2, alpha = 0.5) +
facet_wrap(vars(keys), ncol = 1, scales = "fixed") +
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text = element_text(size = 12),
strip.text.x = element_text(size = 13),
legend.position = "none"
)Evaluate
How many examples were used?
# All examples
num_all_examples <- nrow(fit$data)
num_all_examples_positive <- sum(fit$data[, CHOSEN_OUTCOME], na.rm = T)
num_complete_cases <- nrow(fit$model)
num_complete_cases_positive <- sum(fit$model[, CHOSEN_OUTCOME], na.rm = T)
# Print
sprintf(
"
All examples = %s (%s with outcome)
Fitted examples = %s (%s with outcome)",
num_all_examples,
num_all_examples_positive,
num_complete_cases,
num_complete_cases_positive
) %>% cat()
All examples = 228 (178 with outcome)
Fitted examples = 223 (175 with outcome)
Residual analysis.
Pseudo R-squared.
fitting null model for pseudo-r2
llh llhNull G2 McFadden r2ML r2CU
-80.85057 -116.14411 70.58709 0.30388 0.27133 0.41928
ROC-AUC.
# Compute
auroc <- cvAUC::ci.cvAUC(
predictions = predicted_probs,
labels = fit$model[, CHOSEN_OUTCOME]
)
# Print
sprintf(
"Cross-validated AUROC = %.3f (SE, %.3f; %s%% CI, %.3f-%.3f)",
auroc$cvAUC,
auroc$se,
auroc$confidence * 100,
auroc$ci[1],
auroc$ci[2]
)[1] "Cross-validated AUROC = 0.847 (SE, 0.029; 95% CI, 0.791-0.904)"
PR-AUC.
# Construct object
precrec_obj_tr <- precrec::evalmod(
scores = predicted_probs,
labels = fit$model[, CHOSEN_OUTCOME],
calc_avg = F,
cb_alpha = 0.05
)
# Print
# NOTE: Can use precrec::auc_ci to get CIs, but need to have multiple datasets
precrec_obj_tr %>%
precrec::part() %>%
precrec::pauc() %>%
sable()| modnames | dsids | curvetypes | paucs | spaucs |
|---|---|---|---|---|
| m1 | 1 | ROC | 0.84732 | 0.84732 |
| m1 | 1 | PRC | 0.95422 | 0.95422 |
ROC and PR curves.
Another version of the ROC curve.
# Calculate ROC curve
roc_curve <- pROC::roc(fit$model[, CHOSEN_OUTCOME] ~ predicted_probs)
# Plot ROC curve
plot(roc_curve)Area under the curve: 0.847
Plot all performance measures.
# Construct object
precrec_obj_tr2 <- precrec::evalmod(
scores = predicted_probs,
labels = fit$model[, CHOSEN_OUTCOME],
mode = "basic"
)
# Plot
precrec_obj_tr2 %>% autoplot()Confusion matrix.
Kappa statistic: 0 = No agreement; 0.1-0.2 = Slight agreement; 0.2-0.4 = Fair agreement; 0.4-0.6 = Moderate agreement; 0.6-0.8 = Substantial agreement; 0.8 - 1 = Near perfect agreement
No Information Rate (NIR): The proportion of the largest observed class - for example, if we had 35 patients and 23 patients had the outcome, this is the largest class and the NIR is 23/35 = 0.657. The larger the deviation of Accuracy from NIR the more confident we are that the model is not just choosing the largest class.
# Scores
pred <- factor(ifelse(predicted_probs > 0.5, "Event", "No Event"))
truth <- factor(ifelse(fit$model[, CHOSEN_OUTCOME], "Event", "No Event"))
# Count values per category
xtab <- table(
scores = pred,
labels = truth
)
# Compute
confusion_matrix <-
caret::confusionMatrix(
xtab,
positive = "Event",
prevalence = NULL, # Provide prevalence as a proportion here if you want
mode = "sens_spec"
)
# Print
confusion_matrixConfusion Matrix and Statistics
labels
scores Event No Event
Event 165 25
No Event 10 23
Accuracy : 0.843
95% CI : (0.789, 0.888)
No Information Rate : 0.785
P-Value [Acc > NIR] : 0.0182
Kappa : 0.476
Mcnemar's Test P-Value : 0.0180
Sensitivity : 0.943
Specificity : 0.479
Pos Pred Value : 0.868
Neg Pred Value : 0.697
Prevalence : 0.785
Detection Rate : 0.740
Detection Prevalence : 0.852
Balanced Accuracy : 0.711
'Positive' Class : Event
Variable importance
Plot importance (based on magnitude of z-score).
# Calculate importance
varimp <-
fit %>%
caret::varImp() %>%
as_tibble(rownames = "rn") %>%
mutate(Predictor = factor(rn) %>% fct_reorder(Overall)) %>%
dplyr::select(Predictor, Importance = Overall)
# Plot variable importance
varimp %>%
ggplot(aes(Predictor, Importance, fill = Importance)) +
geom_bar(stat = "identity", alpha = 0.9, width = 0.65) +
coord_flip() +
guides(fill = F) +
labs(y = "\nVariable Importance", x = NULL) +
scale_fill_viridis_c() +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
panel.border = element_blank(),
axis.text.x = element_text(margin = margin(t = 7), size = 12),
axis.text.y = element_text(margin = margin(r = -10), size = 12),
axis.title = element_text(size = 13)
)Multivariable - With interactions
Presents with hemorrhage
Fit logistic.
# Define data
df <- df_multi
# Create formula
model <- as.formula(paste(
CHOSEN_OUTCOME,
"~",
"modified_rankin_score_pretreatment*has_hemorrhage +
spetzler_martin_grade*has_hemorrhage +
age_at_first_treatment_yrs*has_hemorrhage"
))
fit <- glm(
model,
data = df,
family = binomial()
)Print results.
# Summarize model coefficients
fit_summary <-
fit %>%
broom::tidy(exponentiate = T, conf.int = T) %>%
arrange(term == "(Intercept)", p.value) %>%
rename(odds_ratio = estimate, z_value = statistic)
# Stylize and print
stylized <-
fit_summary %>%
rename(
"Predictors" = term,
"Odds Ratios (OR)" = odds_ratio,
"SE" = std.error,
"Z-scores" = z_value,
"P-values" = p.value,
"CI (low)" = conf.low,
"CI (high)" = conf.high,
) %>%
mutate(
`Odds Ratios (OR)` = round(`Odds Ratios (OR)`, 2),
`SE` = round(`SE`, 2),
`CI (low)` = round(`CI (low)`, 2),
`CI (high)` = round(`CI (high)`, 2),
`P-values` = round(`P-values`, 3),
)
# Print
stylized %>% sable()| Predictors | Odds Ratios (OR) | SE | Z-scores | P-values | CI (low) | CI (high) |
|---|---|---|---|---|---|---|
| spetzler_martin_grade | 0.38 | 0.25 | -3.77844 | 0.000 | 0.22 | 0.61 |
| modified_rankin_score_pretreatment:has_hemorrhageTRUE | 0.55 | 0.36 | -1.64993 | 0.099 | 0.26 | 1.10 |
| modified_rankin_score_pretreatment | 1.57 | 0.32 | 1.41656 | 0.157 | 0.86 | 3.05 |
| has_hemorrhageTRUE | 6.59 | 1.73 | 1.09284 | 0.274 | 0.22 | 204.87 |
| age_at_first_treatment_yrs | 1.05 | 0.06 | 0.84770 | 0.397 | 0.93 | 1.19 |
| has_hemorrhageTRUE:age_at_first_treatment_yrs | 0.95 | 0.10 | -0.55731 | 0.577 | 0.78 | 1.15 |
| has_hemorrhageTRUE:spetzler_martin_grade | 1.08 | 0.36 | 0.22000 | 0.826 | 0.54 | 2.20 |
| (Intercept) | 18.21 | 1.17 | 2.48180 | 0.013 | 2.20 | 224.85 |
Model comparison
Special cases
SM < 4
Select features.
# Define p-value threshold
pval_threshold <- 0.2
# Get the predictors of interest
predictors <-
univariable_adjusted_recoded %>%
bind_rows(univariable_unadjusted_recoded) %>%
filter(`P-values` < pval_threshold) %>%
pull(`Predictors`)
# Clean categorical variables
predictors <-
predictors %>%
str_replace("^([a-z0-9_]*)([A-Z].*)$", "\\1") %>%
unique()
# Add adjustment variables
predictors <- unique(c(predictors))
# Remove unwanted predictors
predictors <- predictors[!predictors %in% unwanted_with_gradings]
# Replace SM by its binary variant
predictors <- c(predictors, "is_spetzler_martin_grade_less_than_4")
predictors <- predictors[!predictors == "spetzler_martin_grade"]
# Print
print(predictors)[1] "is_diffuse_nidus" "has_hemorrhage"
[3] "location" "has_deficit"
[5] "procedure_combinations" "has_associated_aneurysm"
[7] "is_spetzler_martin_grade_less_than_4"
Fit logistic.
# Define data
df <- df_multi %>% drop_na(modified_rankin_score_pretreatment, location)
# Create formula
model <- as.formula(paste(
CHOSEN_OUTCOME,
"~",
paste(predictors, collapse = " + ")
))
fit <- glm(
model,
data = df,
family = binomial()
)
# Save
fit_without_grading <- fitPrint results.
# Summarize model coefficients
fit_summary <-
fit %>%
broom::tidy(exponentiate = T, conf.int = T) %>%
arrange(term == "(Intercept)", p.value) %>%
rename(odds_ratio = estimate, z_value = statistic)
# Stylize and print
stylized <-
fit_summary %>%
rename(
"Predictors" = term,
"Odds Ratios (OR)" = odds_ratio,
"SE" = std.error,
"Z-scores" = z_value,
"P-values" = p.value,
"CI (low)" = conf.low,
"CI (high)" = conf.high,
) %>%
mutate(
`Odds Ratios (OR)` = round(`Odds Ratios (OR)`, 2),
`SE` = round(`SE`, 2),
`CI (low)` = round(`CI (low)`, 2),
`CI (high)` = round(`CI (high)`, 2),
`P-values` = round(`P-values`, 3),
)
# Print
stylized %>% sable()| Predictors | Odds Ratios (OR) | SE | Z-scores | P-values | CI (low) | CI (high) |
|---|---|---|---|---|---|---|
| procedure_combinationsMultimodal | 1.4190e+01 | 1.19 | 2.23496 | 0.025 | 1.53 | 1.9091e+02 |
| locationDeep | 1.9000e-01 | 0.88 | -1.91273 | 0.056 | 0.02 | 8.5000e-01 |
| is_diffuse_nidusTRUE | 4.2000e-01 | 0.52 | -1.66154 | 0.097 | 0.15 | 1.1700e+00 |
| is_spetzler_martin_grade_less_than_4TRUE | 2.0700e+00 | 0.45 | 1.61798 | 0.106 | 0.85 | 5.0300e+00 |
| procedure_combinationsR | 6.3700e+00 | 1.22 | 1.51465 | 0.130 | 0.64 | 9.0260e+01 |
| locationHemispheric | 2.7000e-01 | 0.86 | -1.50365 | 0.133 | 0.03 | 1.2000e+00 |
| has_hemorrhageTRUE | 1.8200e+00 | 0.40 | 1.48815 | 0.137 | 0.83 | 4.0400e+00 |
| has_deficitTRUE | 6.5000e-01 | 0.40 | -1.07200 | 0.284 | 0.30 | 1.4400e+00 |
| has_associated_aneurysmTRUE | 7.1000e-01 | 0.49 | -0.68499 | 0.493 | 0.28 | 1.9100e+00 |
| procedure_combinationsS | 1.9502e+08 | 878.36 | 0.02173 | 0.983 | 0.00 | 2.1708e+138 |
| (Intercept) | 7.3000e-01 | 1.29 | -0.24738 | 0.805 | 0.06 | 1.0680e+01 |
Create robust CIs using the sandwich estimator.
# Calculate robust standard errors
robust_se <- sandwich::vcovHC(fit, type = "HC0")
# Compute robust confidence intervals
robust_ci <- lmtest::coeftest(fit, vcov. = robust_se)Stylize results.
# Summarize model coefficients
fit_summary <-
robust_ci %>%
# broom does not support exponentiation after coeftest so do it manually
broom::tidy(exponentiate = F, conf.int = T) %>%
mutate(across(c(estimate, "conf.low", "conf.high"), exp)) %>%
arrange(term == "(Intercept)", p.value) %>%
rename(odds_ratio = estimate, z_value = statistic)
# Stylize and print
multivariable_pvalue <-
fit_summary %>%
rename(
"Predictors" = term,
"Odds Ratios (OR)" = odds_ratio,
"SE" = std.error,
"Z-scores" = z_value,
"P-values" = p.value,
"CI (low)" = conf.low,
"CI (high)" = conf.high,
) %>%
mutate(
`Odds Ratios (OR)` = round(`Odds Ratios (OR)`, 2),
`SE` = round(`SE`, 2),
`CI (low)` = round(`CI (low)`, 2),
`CI (high)` = round(`CI (high)`, 2),
`P-values` = round(`P-values`, 3),
)
# Print
multivariable_pvalue %>% sable()| Predictors | Odds Ratios (OR) | SE | Z-scores | P-values | CI (low) | CI (high) |
|---|---|---|---|---|---|---|
| procedure_combinationsS | 1.9502e+08 | 0.84 | 22.75592 | 0.000 | 3.7675e+07 | 1.0095e+09 |
| procedure_combinationsMultimodal | 1.4190e+01 | 0.83 | 3.19079 | 0.001 | 2.7800e+00 | 7.2400e+01 |
| procedure_combinationsR | 6.3700e+00 | 0.89 | 2.07584 | 0.038 | 1.1100e+00 | 3.6620e+01 |
| locationDeep | 1.9000e-01 | 0.93 | -1.80418 | 0.071 | 3.0000e-02 | 1.1600e+00 |
| is_diffuse_nidusTRUE | 4.2000e-01 | 0.52 | -1.67381 | 0.094 | 1.5000e-01 | 1.1600e+00 |
| is_spetzler_martin_grade_less_than_4TRUE | 2.0700e+00 | 0.45 | 1.60784 | 0.108 | 8.5000e-01 | 5.0400e+00 |
| has_hemorrhageTRUE | 1.8200e+00 | 0.40 | 1.47647 | 0.140 | 8.2000e-01 | 4.0100e+00 |
| locationHemispheric | 2.7000e-01 | 0.91 | -1.43094 | 0.152 | 5.0000e-02 | 1.6200e+00 |
| has_deficitTRUE | 6.5000e-01 | 0.40 | -1.07427 | 0.283 | 3.0000e-01 | 1.4300e+00 |
| has_associated_aneurysmTRUE | 7.1000e-01 | 0.49 | -0.69166 | 0.489 | 2.8000e-01 | 1.8500e+00 |
| (Intercept) | 7.3000e-01 | 1.02 | -0.31515 | 0.753 | 1.0000e-01 | 5.3100e+00 |
Plot the coefficients with their confidence intervals.
robust_ci %>%
# broom does not support exponentiation after coeftest so do it manually
broom::tidy(exponentiate = F, conf.int = T) %>%
mutate(across(c(estimate, "conf.low", "conf.high"), exp)) %>%
filter(term != "(Intercept)") %>%
arrange(desc(estimate)) %>%
mutate(term = factor(term, levels = term)) %>%
ggplot(aes(x = estimate, y = term, color = term)) +
geom_pointrange(aes(xmin = conf.low, xmax = conf.high)) +
geom_vline(xintercept = 1, linetype = 2, alpha = 0.5) +
labs(x = "Odds ratio", y = NULL) +
guides(color = F) +
theme(
axis.text = element_text(size = 12),
axis.title = element_text(size = 13)
)Had surgery
Select features.
# Define p-value threshold
pval_threshold <- 0.2
# Get the predictors of interest
predictors <-
univariable_adjusted_recoded %>%
bind_rows(univariable_unadjusted_recoded) %>%
filter(`P-values` < pval_threshold) %>%
pull(`Predictors`)
# Clean categorical variables
predictors <-
predictors %>%
str_replace("^([a-z0-9_]*)([A-Z].*)$", "\\1") %>%
unique()
# Add adjustment variables
predictors <- unique(c(predictors))
# Remove unwanted predictors
predictors <- predictors[!predictors %in% unwanted_with_gradings]
# Replace SM by its binary variant
predictors <- c(predictors, "is_surgery")
predictors <- predictors[!predictors == "procedure_combinations"]
# Print
print(predictors)[1] "spetzler_martin_grade" "is_diffuse_nidus" "has_hemorrhage"
[4] "location" "has_deficit" "has_associated_aneurysm"
[7] "is_surgery"
Fit logistic.
# Define data
df <- df_multi %>% drop_na(modified_rankin_score_pretreatment, location)
# Create formula
model <- as.formula(paste(
CHOSEN_OUTCOME,
"~",
paste(predictors, collapse = " + ")
))
fit <- glm(
model,
data = df,
family = binomial()
)
# Save
fit_without_grading <- fitPrint results.
# Summarize model coefficients
fit_summary <-
fit %>%
broom::tidy(exponentiate = T, conf.int = T) %>%
arrange(term == "(Intercept)", p.value) %>%
rename(odds_ratio = estimate, z_value = statistic)
# Stylize and print
stylized <-
fit_summary %>%
rename(
"Predictors" = term,
"Odds Ratios (OR)" = odds_ratio,
"SE" = std.error,
"Z-scores" = z_value,
"P-values" = p.value,
"CI (low)" = conf.low,
"CI (high)" = conf.high,
) %>%
mutate(
`Odds Ratios (OR)` = round(`Odds Ratios (OR)`, 2),
`SE` = round(`SE`, 2),
`CI (low)` = round(`CI (low)`, 2),
`CI (high)` = round(`CI (high)`, 2),
`P-values` = round(`P-values`, 3),
)
# Print
stylized %>% sable()| Predictors | Odds Ratios (OR) | SE | Z-scores | P-values | CI (low) | CI (high) |
|---|---|---|---|---|---|---|
| spetzler_martin_grade | 5.5000e-01 | 0.23 | -2.64812 | 0.008 | 0.35 | 8.4000e-01 |
| locationDeep | 2.7000e-01 | 0.85 | -1.54021 | 0.124 | 0.04 | 1.2100e+00 |
| has_hemorrhageTRUE | 1.8100e+00 | 0.39 | 1.50991 | 0.131 | 0.84 | 3.9400e+00 |
| locationHemispheric | 4.1000e-01 | 0.83 | -1.07502 | 0.282 | 0.06 | 1.7500e+00 |
| has_deficitTRUE | 6.6000e-01 | 0.40 | -1.04939 | 0.294 | 0.30 | 1.4500e+00 |
| is_diffuse_nidusTRUE | 6.5000e-01 | 0.53 | -0.81817 | 0.413 | 0.23 | 1.8500e+00 |
| has_associated_aneurysmTRUE | 7.1000e-01 | 0.47 | -0.74099 | 0.459 | 0.28 | 1.8000e+00 |
| is_surgeryTRUE | 1.1434e+07 | 863.18 | 0.01883 | 0.985 | 0.00 | 3.6053e+130 |
| (Intercept) | 5.6000e+01 | 1.13 | 3.54785 | 0.000 | 7.54 | 6.9535e+02 |
Create robust CIs using the sandwich estimator.
# Calculate robust standard errors
robust_se <- sandwich::vcovHC(fit, type = "HC0")
# Compute robust confidence intervals
robust_ci <- lmtest::coeftest(fit, vcov. = robust_se)Stylize results.
# Summarize model coefficients
fit_summary <-
robust_ci %>%
# broom does not support exponentiation after coeftest so do it manually
broom::tidy(exponentiate = F, conf.int = T) %>%
mutate(across(c(estimate, "conf.low", "conf.high"), exp)) %>%
arrange(term == "(Intercept)", p.value) %>%
rename(odds_ratio = estimate, z_value = statistic)
# Stylize and print
multivariable_pvalue <-
fit_summary %>%
rename(
"Predictors" = term,
"Odds Ratios (OR)" = odds_ratio,
"SE" = std.error,
"Z-scores" = z_value,
"P-values" = p.value,
"CI (low)" = conf.low,
"CI (high)" = conf.high,
) %>%
mutate(
`Odds Ratios (OR)` = round(`Odds Ratios (OR)`, 2),
`SE` = round(`SE`, 2),
`CI (low)` = round(`CI (low)`, 2),
`CI (high)` = round(`CI (high)`, 2),
`P-values` = round(`P-values`, 3),
)
# Print
multivariable_pvalue %>% sable()| Predictors | Odds Ratios (OR) | SE | Z-scores | P-values | CI (low) | CI (high) |
|---|---|---|---|---|---|---|
| is_surgeryTRUE | 1.1434e+07 | 0.38 | 43.19075 | 0.000 | 5468706.59 | 2.3904e+07 |
| spetzler_martin_grade | 5.5000e-01 | 0.21 | -2.89717 | 0.004 | 0.37 | 8.2000e-01 |
| locationDeep | 2.7000e-01 | 0.78 | -1.67266 | 0.094 | 0.06 | 1.2500e+00 |
| has_hemorrhageTRUE | 1.8100e+00 | 0.39 | 1.51604 | 0.130 | 0.84 | 3.8700e+00 |
| locationHemispheric | 4.1000e-01 | 0.79 | -1.13739 | 0.255 | 0.09 | 1.9100e+00 |
| has_deficitTRUE | 6.6000e-01 | 0.40 | -1.03820 | 0.299 | 0.30 | 1.4500e+00 |
| is_diffuse_nidusTRUE | 6.5000e-01 | 0.51 | -0.84556 | 0.398 | 0.24 | 1.7600e+00 |
| has_associated_aneurysmTRUE | 7.1000e-01 | 0.48 | -0.71831 | 0.473 | 0.28 | 1.8200e+00 |
| (Intercept) | 5.6000e+01 | 1.16 | 3.48016 | 0.001 | 5.80 | 5.4050e+02 |
Plot the coefficients with their confidence intervals.
robust_ci %>%
# broom does not support exponentiation after coeftest so do it manually
broom::tidy(exponentiate = F, conf.int = T) %>%
mutate(across(c(estimate, "conf.low", "conf.high"), exp)) %>%
filter(term != "(Intercept)") %>%
arrange(desc(estimate)) %>%
mutate(term = factor(term, levels = term)) %>%
ggplot(aes(x = estimate, y = term, color = term)) +
geom_pointrange(aes(xmin = conf.low, xmax = conf.high)) +
geom_vline(xintercept = 1, linetype = 2, alpha = 0.5) +
labs(x = "Odds ratio", y = NULL) +
guides(color = F) +
theme(
axis.text = element_text(size = 12),
axis.title = element_text(size = 13)
)Grade 5s
df_uni %>%
drop_na(spetzler_martin_grade) %>%
count(is_poor_outcome, spetzler_martin_grade, name = "num_patients") %>%
mutate(
pct_patients = scales::percent(num_patients / sum(num_patients), 1)
) %>%
sable()| is_poor_outcome | spetzler_martin_grade | num_patients | pct_patients |
|---|---|---|---|
| FALSE | 1 | 38 | 17% |
| FALSE | 2 | 45 | 20% |
| FALSE | 3 | 65 | 29% |
| FALSE | 4 | 37 | 16% |
| FALSE | 5 | 17 | 7% |
| TRUE | 1 | 2 | 1% |
| TRUE | 2 | 3 | 1% |
| TRUE | 3 | 6 | 3% |
| TRUE | 4 | 7 | 3% |
| TRUE | 5 | 8 | 4% |
Only consider those with SM grade 5.
df_uni %>%
filter(spetzler_martin_grade == 5) %>%
count(is_poor_outcome, spetzler_martin_grade, name = "num_patients") %>%
mutate(
pct_patients = scales::percent(num_patients / sum(num_patients), 1)
) %>%
sable()| is_poor_outcome | spetzler_martin_grade | num_patients | pct_patients |
|---|---|---|---|
| FALSE | 5 | 17 | 68% |
| TRUE | 5 | 8 | 32% |
Honest esitmation (WIP)
https://www.pnas.org/doi/full/10.1073/pnas.1510489113
https://arxiv.org/pdf/1510.04342
https://github.com/grf-labs/grf
https://grf-labs.github.io/grf/REFERENCE.html
https://cran.r-project.org/web/packages/hettx/vignettes/detect_idiosyncratic_vignette.html
Write
Setup
Create the necessary directories.
# Get today's date
today <- Sys.Date()
today <- format(today, "%Y-%m-%d")
# Create folder names
base_folder <- file.path(DST_DIRNAME, ANALYSIS_NAME)
date_folder <- file.path(base_folder, today)
csv_folder <- file.path(date_folder, "csv")
pdf_folder <- file.path(date_folder, "pdf")
html_folder <- file.path(date_folder, "html")
# Create folders
suppressWarnings(dir.create(base_folder))
suppressWarnings(dir.create(date_folder))
suppressWarnings(dir.create(csv_folder))
suppressWarnings(dir.create(pdf_folder))
suppressWarnings(dir.create(html_folder))
# Print folder names
print(base_folder)
print(date_folder)
print(csv_folder)
print(pdf_folder)
print(html_folder)[1] "../../outputs//predictive-analytics/is-obliterated"
[1] "../../outputs//predictive-analytics/is-obliterated/2024-07-30"
[1] "../../outputs//predictive-analytics/is-obliterated/2024-07-30/csv"
[1] "../../outputs//predictive-analytics/is-obliterated/2024-07-30/pdf"
[1] "../../outputs//predictive-analytics/is-obliterated/2024-07-30/html"
Figures
Write all figures.
Tables
Write all tables.
# # Arguments
# df <- tables$desc_stats_cohorts_cv_prolaio
# filepath_csv <- file.path(csv_folder, "desc-stats_by-cohort_cov.csv")
# filepath_html <- file.path(html_folder, "desc-stats_by-cohort_cov.html")
#
# # Save as CSV
# write_csv(
# x = df,
# file = filepath_csv
# )
#
# # Save as HTML
# # - Save pdf does not work with webshot
# # - It works with pagedown but not as pretty as desired
# df %>%
# sable() %>%
# kableExtra::save_kable(file = filepath_html)Reproducibility
Linting and styling
Dependency management
Documentation
Session Info
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.5
Matrix products: default
BLAS:
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;
LAPACK version 3.11.0
attached base packages:
[1] parallel stats graphics grDevices datasets utils methods base
other attached packages:
[1] rmarkdown_2.25 table1_1.4.3 shiny_1.8.0
[4] lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
[7] dplyr_1.1.4 purrr_1.0.2 readr_2.1.5
[10] tidyr_1.3.1 tibble_3.2.1 tidyverse_2.0.0
[13] selectiveInference_1.2.5 MASS_7.3-60 adaptMCMC_1.5
[16] coda_0.19-4.1 survival_3.5-7 intervals_0.15.4
[19] glmnet_4.1-8 Matrix_1.6-1.1 magrittr_2.0.3
[22] ggplot2_3.5.0 kableExtra_1.4.0 rmdformats_1.0.4
[25] knitr_1.45
loaded via a namespace (and not attached):
[1] splines_4.3.2 later_1.3.2 versions_0.3
[4] R.oo_1.26.0 hardhat_1.4.0 pROC_1.18.5
[7] rpart_4.1.21 rex_1.2.1 lifecycle_1.0.4
[10] tcltk_4.3.2 globals_0.16.3 processx_3.8.3
[13] lattice_0.21-9 vroom_1.6.5 backports_1.4.1
[16] sass_0.4.8 jquerylib_0.1.4 yaml_2.3.8
[19] remotes_2.4.2.1 httpuv_1.6.14 summarytools_1.0.1
[22] pkgload_1.3.4 R.cache_0.16.0 R.utils_2.12.3
[25] styler_1.10.2 nnet_7.3-19 sandwich_3.1-0
[28] ipred_0.9-14 lava_1.8.0 listenv_0.9.1
[31] parallelly_1.37.1 svglite_2.1.3 codetools_0.2-19
[34] xml2_1.3.6 tidyselect_1.2.1 precrec_0.14.4
[37] shape_1.4.6.1 futile.logger_1.4.3 farver_2.1.1
[40] matrixStats_1.3.0 stats4_4.3.2 base64enc_0.1-3
[43] jsonlite_1.8.8 caret_6.0-94 e1071_1.7-14
[46] ellipsis_0.3.2 Formula_1.2-5 iterators_1.0.14
[49] systemfonts_1.0.5 foreach_1.5.2 tools_4.3.2
[52] pryr_0.1.6 Rcpp_1.0.12 glue_1.7.0
[55] gridExtra_2.3 prodlim_2023.08.28 xfun_0.42
[58] withr_3.0.0 formatR_1.14 BiocManager_1.30.22
[61] fastmap_1.1.1 fansi_1.0.6 callr_3.7.5
[64] digest_0.6.34 lintr_3.1.1 timechange_0.3.0
[67] R6_2.5.1 mime_0.12 colorspace_2.1-0
[70] R.methodsS3_1.8.2 utf8_1.2.4 generics_0.1.3
[73] renv_1.0.4 data.table_1.15.4 recipes_1.0.10
[76] class_7.3-22 stevedore_0.9.6 ModelMetrics_1.2.2.2
[79] pkgconfig_2.0.3 gtable_0.3.4 timeDate_4032.109
[82] lmtest_0.9-40 containerit_0.6.0.9004 htmltools_0.5.7
[85] bookdown_0.39 scales_1.3.0 cyclocomp_1.1.1
[88] gower_1.0.1 lambda.r_1.2.4 rstudioapi_0.15.0
[91] tzdb_0.4.0 reshape2_1.4.4 checkmate_2.3.1
[94] nlme_3.1-163 curl_5.2.1 ggcorrplot_0.1.4.1
[97] proxy_0.4-27 zoo_1.8-12 cachem_1.0.8
[100] miniUI_0.1.1.1 desc_1.4.3 pillar_1.9.0
[103] grid_4.3.2 vctrs_0.6.5 pscl_1.5.9
[106] promises_1.2.1 shinyFiles_0.9.3 xtable_1.8-4
[109] evaluate_0.23 cvAUC_1.1.4 magick_2.8.3
[112] cli_3.6.2 compiler_4.3.2 futile.options_1.0.1
[115] rlang_1.1.3 crayon_1.5.2 future.apply_1.11.2
[118] labeling_0.4.3 ps_1.7.6 plyr_1.8.9
[121] fs_1.6.3 stringi_1.8.3 pander_0.6.5
[124] viridisLite_0.4.2 assertthat_0.2.1 munsell_0.5.0
[127] lazyeval_0.2.2 rapportools_1.1 hms_1.1.3
[130] bit64_4.0.5 future_1.33.2 highr_0.10
[133] ROCR_1.0-11 semver_0.2.0 broom_1.0.6
[136] memoise_2.0.1 bslib_0.6.1 bit_4.0.5
References
[[1]]
Scheidegger A, , (2024). _adaptMCMC: Implementation of a Generic Adaptive Monte
Carlo Markov Chain Sampler_. R package version 1.5,
<https://CRAN.R-project.org/package=adaptMCMC>.
[[2]]
R Core Team (2023). _R: A Language and Environment for Statistical Computing_.
R Foundation for Statistical Computing, Vienna, Austria.
<https://www.R-project.org/>.
[[3]]
Plummer M, Best N, Cowles K, Vines K (2006). "CODA: Convergence Diagnosis and
Output Analysis for MCMC." _R News_, *6*(1), 7-11.
<https://journal.r-project.org/archive/>.
[[4]]
R Core Team (2023). _R: A Language and Environment for Statistical Computing_.
R Foundation for Statistical Computing, Vienna, Austria.
<https://www.R-project.org/>.
[[5]]
Wickham H, François R, Henry L, Müller K, Vaughan D (2023). _dplyr: A Grammar
of Data Manipulation_. R package version 1.1.4,
<https://CRAN.R-project.org/package=dplyr>.
[[6]]
Wickham H (2023). _forcats: Tools for Working with Categorical Variables
(Factors)_. R package version 1.0.0,
<https://CRAN.R-project.org/package=forcats>.
[[7]]
Wickham H (2016). _ggplot2: Elegant Graphics for Data Analysis_.
Springer-Verlag New York. ISBN 978-3-319-24277-4,
<https://ggplot2.tidyverse.org>.
[[8]]
Friedman J, Tibshirani R, Hastie T (2010). "Regularization Paths for
Generalized Linear Models via Coordinate Descent." _Journal of Statistical
Software_, *33*(1), 1-22. doi:10.18637/jss.v033.i01
<https://doi.org/10.18637/jss.v033.i01>.
Simon N, Friedman J, Tibshirani R, Hastie T (2011). "Regularization Paths for
Cox's Proportional Hazards Model via Coordinate Descent." _Journal of
Statistical Software_, *39*(5), 1-13. doi:10.18637/jss.v039.i05
<https://doi.org/10.18637/jss.v039.i05>.
Tay JK, Narasimhan B, Hastie T (2023). "Elastic Net Regularization Paths for
All Generalized Linear Models." _Journal of Statistical Software_, *106*(1),
1-31. doi:10.18637/jss.v106.i01 <https://doi.org/10.18637/jss.v106.i01>.
[[9]]
R Core Team (2023). _R: A Language and Environment for Statistical Computing_.
R Foundation for Statistical Computing, Vienna, Austria.
<https://www.R-project.org/>.
[[10]]
R Core Team (2023). _R: A Language and Environment for Statistical Computing_.
R Foundation for Statistical Computing, Vienna, Austria.
<https://www.R-project.org/>.
[[11]]
Bourgon R (2023). _intervals: Tools for Working with Points and Intervals_. R
package version 0.15.4, <https://CRAN.R-project.org/package=intervals>.
[[12]]
Zhu H (2024). _kableExtra: Construct Complex Table with 'kable' and Pipe
Syntax_. R package version 1.4.0,
<https://CRAN.R-project.org/package=kableExtra>.
[[13]]
Xie Y (2023). _knitr: A General-Purpose Package for Dynamic Report Generation
in R_. R package version 1.45, <https://yihui.org/knitr/>.
Xie Y (2015). _Dynamic Documents with R and knitr_, 2nd edition. Chapman and
Hall/CRC, Boca Raton, Florida. ISBN 978-1498716963, <https://yihui.org/knitr/>.
Xie Y (2014). "knitr: A Comprehensive Tool for Reproducible Research in R." In
Stodden V, Leisch F, Peng RD (eds.), _Implementing Reproducible Computational
Research_. Chapman and Hall/CRC. ISBN 978-1466561595.
[[14]]
Grolemund G, Wickham H (2011). "Dates and Times Made Easy with lubridate."
_Journal of Statistical Software_, *40*(3), 1-25.
<https://www.jstatsoft.org/v40/i03/>.
[[15]]
Bache S, Wickham H (2022). _magrittr: A Forward-Pipe Operator for R_. R package
version 2.0.3, <https://CRAN.R-project.org/package=magrittr>.
[[16]]
Venables WN, Ripley BD (2002). _Modern Applied Statistics with S_, Fourth
edition. Springer, New York. ISBN 0-387-95457-0,
<https://www.stats.ox.ac.uk/pub/MASS4/>.
[[17]]
Bates D, Maechler M, Jagan M (2023). _Matrix: Sparse and Dense Matrix Classes
and Methods_. R package version 1.6-1.1,
<https://CRAN.R-project.org/package=Matrix>.
[[18]]
R Core Team (2023). _R: A Language and Environment for Statistical Computing_.
R Foundation for Statistical Computing, Vienna, Austria.
<https://www.R-project.org/>.
[[19]]
R Core Team (2023). _R: A Language and Environment for Statistical Computing_.
R Foundation for Statistical Computing, Vienna, Austria.
<https://www.R-project.org/>.
[[20]]
Wickham H, Henry L (2023). _purrr: Functional Programming Tools_. R package
version 1.0.2, <https://CRAN.R-project.org/package=purrr>.
[[21]]
Wickham H, Hester J, Bryan J (2024). _readr: Read Rectangular Text Data_. R
package version 2.1.5, <https://CRAN.R-project.org/package=readr>.
[[22]]
Allaire J, Xie Y, Dervieux C, McPherson J, Luraschi J, Ushey K, Atkins A,
Wickham H, Cheng J, Chang W, Iannone R (2023). _rmarkdown: Dynamic Documents
for R_. R package version 2.25, <https://github.com/rstudio/rmarkdown>.
Xie Y, Allaire J, Grolemund G (2018). _R Markdown: The Definitive Guide_.
Chapman and Hall/CRC, Boca Raton, Florida. ISBN 9781138359338,
<https://bookdown.org/yihui/rmarkdown>.
Xie Y, Dervieux C, Riederer E (2020). _R Markdown Cookbook_. Chapman and
Hall/CRC, Boca Raton, Florida. ISBN 9780367563837,
<https://bookdown.org/yihui/rmarkdown-cookbook>.
[[23]]
Barnier J (2022). _rmdformats: HTML Output Formats and Templates for
'rmarkdown' Documents_. R package version 1.0.4,
<https://CRAN.R-project.org/package=rmdformats>.
[[24]]
Tibshirani R, Tibshirani R, Taylor J, Loftus J, Reid S, Markovic J (2019).
_selectiveInference: Tools for Post-Selection Inference_. R package version
1.2.5, <https://CRAN.R-project.org/package=selectiveInference>.
[[25]]
Chang W, Cheng J, Allaire J, Sievert C, Schloerke B, Xie Y, Allen J, McPherson
J, Dipert A, Borges B (2023). _shiny: Web Application Framework for R_. R
package version 1.8.0, <https://CRAN.R-project.org/package=shiny>.
[[26]]
R Core Team (2023). _R: A Language and Environment for Statistical Computing_.
R Foundation for Statistical Computing, Vienna, Austria.
<https://www.R-project.org/>.
[[27]]
Wickham H (2023). _stringr: Simple, Consistent Wrappers for Common String
Operations_. R package version 1.5.1,
<https://CRAN.R-project.org/package=stringr>.
[[28]]
Therneau T (2023). _A Package for Survival Analysis in R_. R package version
3.5-7, <https://CRAN.R-project.org/package=survival>.
Terry M. Therneau, Patricia M. Grambsch (2000). _Modeling Survival Data:
Extending the Cox Model_. Springer, New York. ISBN 0-387-98784-3.
[[29]]
Rich B (2023). _table1: Tables of Descriptive Statistics in HTML_. R package
version 1.4.3, <https://CRAN.R-project.org/package=table1>.
[[30]]
Müller K, Wickham H (2023). _tibble: Simple Data Frames_. R package version
3.2.1, <https://CRAN.R-project.org/package=tibble>.
[[31]]
Wickham H, Vaughan D, Girlich M (2024). _tidyr: Tidy Messy Data_. R package
version 1.3.1, <https://CRAN.R-project.org/package=tidyr>.
[[32]]
Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G,
Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K,
Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K,
Yutani H (2019). "Welcome to the tidyverse." _Journal of Open Source Software_,
*4*(43), 1686. doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.
[[33]]
R Core Team (2023). _R: A Language and Environment for Statistical Computing_.
R Foundation for Statistical Computing, Vienna, Austria.
<https://www.R-project.org/>.